Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Started by Thomas Munroalmost 4 years ago41 messages

thomas.munro@gmail.com

almost 4 years ago

Hi,

Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
doesn't just set a sig_atomic_t flag and poke the latch. Is the extra
stuff it does safe? For example, is this call stack OK (to pick one
that jumps out, but not the only one)?

procsignal_sigusr1_handler
-> RecoveryConflictInterrupt
-> HoldingBufferPinThatDelaysRecovery
-> GetPrivateRefCount
-> GetPrivateRefCountEntry
-> hash_search(...hash table that might be in the middle of an update...)

(I noticed this incidentally while trying to follow along with the
nearby thread on 031_recovery_conflict.pl, but the question of why we
really need this of interest to me for a back-burner project I have to
try to remove all use of signals except for latches, and then remove
the signal emulation for Windows. It may turn out to be a pipe dream,
but this stuff is one of the subproblems.)

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Thomas Munro (#1)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Thomas Munro <thomas.munro@gmail.com> writes:

Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
doesn't just set a sig_atomic_t flag and poke the latch. Is the extra
stuff it does safe? For example, is this call stack OK (to pick one
that jumps out, but not the only one)?

procsignal_sigusr1_handler
-> RecoveryConflictInterrupt
-> HoldingBufferPinThatDelaysRecovery
-> GetPrivateRefCount
-> GetPrivateRefCountEntry
-> hash_search(...hash table that might be in the middle of an update...)

Ugh. That one was safe before somebody decided we needed a hash table
for buffer refcounts, but it's surely not safe now. Which, of course,
demonstrates the folly of allowing signal handlers to call much of
anything; but especially doing so without clearly marking the called
functions as needing to be signal safe.

regards, tom lane

andres@anarazel.de

almost 4 years ago

In reply to: Tom Lane (#2)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Hi,

On 2022-04-09 17:00:41 -0400, Tom Lane wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
doesn't just set a sig_atomic_t flag and poke the latch. Is the extra
stuff it does safe? For example, is this call stack OK (to pick one
that jumps out, but not the only one)?

procsignal_sigusr1_handler
-> RecoveryConflictInterrupt
-> HoldingBufferPinThatDelaysRecovery
-> GetPrivateRefCount
-> GetPrivateRefCountEntry
-> hash_search(...hash table that might be in the middle of an update...)

Ugh. That one was safe before somebody decided we needed a hash table
for buffer refcounts, but it's surely not safe now.

Mea culpa. This is 4b4b680c3d6d - from 2014.

Which, of course, demonstrates the folly of allowing signal handlers to call
much of anything; but especially doing so without clearly marking the called
functions as needing to be signal safe.

Yea. Particularly when just going through bufmgr and updating places that look
at pin counts, it's not immediately obvious that
HoldingBufferPinThatDelaysRecovery() runs in a signal handler. Partially
because RecoveryConflictInterrupt() - which is mentioned in the comment above
HoldingBufferPinThatDelaysRecovery() - sounds a lot like it's called from
ProcessInterrupts(), which doesn't run in a signal handler...

RecoveryConflictInterrupt() calls a lot of functions, some of which quite
plausibly could be changed to not be signal safe, even if they currently are.

Is there really a reason for RecoveryConflictInterrupt() to run in a signal
handler? Given that we only react to conflicts in ProcessInterrupts(), it's
not immediately obvious that we need to do anything in
RecoveryConflictInterrupt() but set some flags. There's probably some minor
efficiency gains, but that seems unconvincing.

The comments really need a rewrite - it sounds like
RecoveryConflictInterrupt() will error out itself:

/*
* If we can abort just the current subtransaction then we are
* OK to throw an ERROR to resolve the conflict. Otherwise
* drop through to the FATAL case.
*
* XXX other times that we can throw just an ERROR *may* be
* PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in
* parent transactions
*
* PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held
* by parent transactions and the transaction is not
* transaction-snapshot mode
*
* PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
* cursors open in parent transactions
*/

it's technically not *wrong* because it's setting up state that then leads to
ERROR / FATAL being thrown, but ...

Greetings,

Andres Freund

andres@anarazel.de

almost 4 years ago

In reply to: Andres Freund (#3)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Hi,

On 2022-04-09 14:39:16 -0700, Andres Freund wrote:

On 2022-04-09 17:00:41 -0400, Tom Lane wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
doesn't just set a sig_atomic_t flag and poke the latch. Is the extra
stuff it does safe? For example, is this call stack OK (to pick one
that jumps out, but not the only one)?

procsignal_sigusr1_handler
-> RecoveryConflictInterrupt
-> HoldingBufferPinThatDelaysRecovery
-> GetPrivateRefCount
-> GetPrivateRefCountEntry
-> hash_search(...hash table that might be in the middle of an update...)

Ugh. That one was safe before somebody decided we needed a hash table
for buffer refcounts, but it's surely not safe now.

Mea culpa. This is 4b4b680c3d6d - from 2014.

Whoa. There's way worse: StandbyTimeoutHandler() calls
SendRecoveryConflictWithBufferPin(), which calls CancelDBBackends(), which
acquires lwlocks etc.

Which very plausibly is the cause for the issue I'm investigating in
/messages/by-id/20220409220054.fqn5arvbeesmxdg5@alap3.anarazel.de

Greetings,

Andres Freund

thomas.munro@gmail.com

almost 4 years ago

In reply to: Andres Freund (#4)

1 attachment(s)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Sun, Apr 10, 2022 at 11:00 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-04-09 14:39:16 -0700, Andres Freund wrote:

On 2022-04-09 17:00:41 -0400, Tom Lane wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
doesn't just set a sig_atomic_t flag and poke the latch. Is the extra
stuff it does safe? For example, is this call stack OK (to pick one
that jumps out, but not the only one)?

procsignal_sigusr1_handler
-> RecoveryConflictInterrupt
-> HoldingBufferPinThatDelaysRecovery
-> GetPrivateRefCount
-> GetPrivateRefCountEntry
-> hash_search(...hash table that might be in the middle of an update...)

Ugh. That one was safe before somebody decided we needed a hash table
for buffer refcounts, but it's surely not safe now.

Mea culpa. This is 4b4b680c3d6d - from 2014.

Whoa. There's way worse: StandbyTimeoutHandler() calls
SendRecoveryConflictWithBufferPin(), which calls CancelDBBackends(), which
acquires lwlocks etc.

Which very plausibly is the cause for the issue I'm investigating in
/messages/by-id/20220409220054.fqn5arvbeesmxdg5@alap3.anarazel.de

Huh. I wouldn't have started a separate thread for this if I'd
realised I was getting close to the cause of the CI failure... I
thought this was an incidental observation. Anyway, I made a first
attempt at fixing this SIGUSR1 problem (I think Andres is looking at
the SIGALRM problem in the other thread).

Instead of bothering to create N different XXXPending variables for
the different conflict "reasons", I used an array. Other than that,
it's much like existing examples.

The existing use of the global variable RecoveryConflictReason seems a
little woolly. Doesn't it get clobbered every time a signal arrives,
even if we determine that there is no conflict? Not sure why that's
OK, but anyway, this patch always sets it together with
RecoveryConflictPending = true.

Attachments:

0001-Fix-recovery-conflict-SIGUSR1-handling.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-recovery-conflict-SIGUSR1-handling.patchDownload

From 1ba60808a23159b8d99cfec70111b6724a55e57b Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 12 Apr 2022 07:33:59 +1200
Subject: [PATCH] Fix recovery conflict SIGUSR1 handling.

We shouldn't be doing real work in a signal handler.  Move the check
into the next CFI.

(There's a related problem in the recovery conflict SIGALRM handling,
for another patch.)

Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/storage/ipc/procsignal.c | 12 +++----
 src/backend/tcop/postgres.c          | 53 ++++++++++++++++++----------
 src/include/storage/procsignal.h     |  4 ++-
 src/include/tcop/tcopprot.h          |  3 +-
 4 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index f41563a0a4..ce08580b5b 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -653,22 +653,22 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 		HandleLogMemoryContextInterrupt();
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_TABLESPACE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
 	SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 95dc2e2c83..a89066badb 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -172,6 +172,7 @@ static bool EchoQuery = false;	/* -E switch */
 static bool UseSemiNewlineNewline = false;	/* -j switch */
 
 /* whether or not, and why, we were canceled by conflict with recovery */
+volatile sig_atomic_t RecoveryConflictInterruptPending[NUM_PROCSIGNALS];
 static bool RecoveryConflictPending = false;
 static bool RecoveryConflictRetryable = true;
 static ProcSignalReason RecoveryConflictReason;
@@ -2993,22 +2994,31 @@ FloatExceptionHandler(SIGNAL_ARGS)
 }
 
 /*
- * RecoveryConflictInterrupt: out-of-line portion of recovery conflict
- * handling following receipt of SIGUSR1. Designed to be similar to die()
- * and StatementCancelHandler(). Called only by a normal user backend
- * that begins a transaction during recovery.
+ * Tell the next CHECK_FOR_INTERRUPTS() to check for a particular type of
+ * recovery conflict.  Runs in a SIGUSR1 handler.
  */
 void
-RecoveryConflictInterrupt(ProcSignalReason reason)
+HandleRecoveryConflictInterrupt(ProcSignalReason reason)
 {
-	int			save_errno = errno;
+	RecoveryConflictInterruptPending[reason] = true;
+	InterruptPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
 
+/*
+ * Check one recovery conflict reason.  This is called when the corresponding
+ * RecoveryConflictInterruptPending flags is set.  If we decide that a conflict
+ * exists, then RecoveryConflictReason and RecoveryConflictPending will be set,
+ * to be handled later in the same invocation of ProcessInterrupts().
+ */
+static void
+ProcessRecoveryConflictInterrupt(ProcSignalReason reason)
+{
 	/*
 	 * Don't joggle the elbow of proc_exit
 	 */
 	if (!proc_exit_inprogress)
 	{
-		RecoveryConflictReason = reason;
 		switch (reason)
 		{
 			case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
@@ -3084,9 +3094,9 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 					if (IsAbortedTransactionBlockState())
 						return;
 
+					RecoveryConflictReason = reason;
 					RecoveryConflictPending = true;
 					QueryCancelPending = true;
-					InterruptPending = true;
 					break;
 				}
 
@@ -3094,9 +3104,9 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 				/* FALLTHROUGH */
 
 			case PROCSIG_RECOVERY_CONFLICT_DATABASE:
+				RecoveryConflictReason = reason;
 				RecoveryConflictPending = true;
 				ProcDiePending = true;
-				InterruptPending = true;
 				break;
 
 			default:
@@ -3115,15 +3125,6 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 		if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
 			RecoveryConflictRetryable = false;
 	}
-
-	/*
-	 * Set the process latch. This function essentially emulates signal
-	 * handlers like die() and StatementCancelHandler() and it seems prudent
-	 * to behave similarly as they do.
-	 */
-	SetLatch(MyLatch);
-
-	errno = save_errno;
 }
 
 /*
@@ -3147,6 +3148,22 @@ ProcessInterrupts(void)
 		return;
 	InterruptPending = false;
 
+	/*
+	 * Have we been asked to check for a recovery conflict?  Processing these
+	 * interrupts may result in RecoveryConflictPending and related variables
+	 * being set, to be handled further down.
+	 */
+	for (int i = PROCSIG_RECOVERY_CONFLICT_FIRST;
+		 i <= PROCSIG_RECOVERY_CONFLICT_LAST;
+		 ++i)
+	{
+		if (RecoveryConflictInterruptPending[i])
+		{
+			RecoveryConflictInterruptPending[i] = false;
+			ProcessRecoveryConflictInterrupt(i);
+		}
+	}
+
 	if (ProcDiePending)
 	{
 		ProcDiePending = false;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index ee636900f3..26d045950c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -37,12 +37,14 @@ typedef enum
 	PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
 
 	/* Recovery conflict reasons */
-	PROCSIG_RECOVERY_CONFLICT_DATABASE,
+	PROCSIG_RECOVERY_CONFLICT_FIRST,
+	PROCSIG_RECOVERY_CONFLICT_DATABASE = PROCSIG_RECOVERY_CONFLICT_FIRST,
 	PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
 	PROCSIG_RECOVERY_CONFLICT_LOCK,
 	PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+	PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 87e408b719..6c3c91aeea 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -73,8 +73,7 @@ extern void die(SIGNAL_ARGS);
 extern void quickdie(SIGNAL_ARGS) pg_attribute_noreturn();
 extern void StatementCancelHandler(SIGNAL_ARGS);
 extern void FloatExceptionHandler(SIGNAL_ARGS) pg_attribute_noreturn();
-extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
-																 * handler */
+extern void HandleRecoveryConflictInterrupt(ProcSignalReason reason);
 extern void ProcessClientReadInterrupt(bool blocked);
 extern void ProcessClientWriteInterrupt(bool blocked);
 
-- 
2.30.2

andres@anarazel.de

almost 4 years ago

In reply to: Thomas Munro (#5)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Hi,

On 2022-04-12 10:33:28 +1200, Thomas Munro wrote:

Huh. I wouldn't have started a separate thread for this if I'd
realised I was getting close to the cause of the CI failure...

:)

Instead of bothering to create N different XXXPending variables for
the different conflict "reasons", I used an array. Other than that,
it's much like existing examples.

It kind of bothers me that each pending conflict has its own external function
call. It doesn't actually cost anything, because it's quite unlikely that
there's more than one pending conflict. Besides aesthetically displeasing,
it also leads to an unnecessarily large amount of code needed, because the
calls to RecoveryConflictInterrupt() can't be merged...

But that's perhaps best fixed separately.

What might actually make more sense is to just have a bitmask or something?

The existing use of the global variable RecoveryConflictReason seems a
little woolly. Doesn't it get clobbered every time a signal arrives,
even if we determine that there is no conflict? Not sure why that's
OK, but anyway, this patch always sets it together with
RecoveryConflictPending = true.

Yea. It's probably ok, kind of, because there shouldn't be multiple
outstanding conflicts with very few exceptions (deadlock and buffer pin). And
it doesn't matter that much which of those gets handled. And we'll retry
again. But brrr.

+/*
+ * Check one recovery conflict reason.  This is called when the corresponding
+ * RecoveryConflictInterruptPending flags is set.  If we decide that a conflict
+ * exists, then RecoveryConflictReason and RecoveryConflictPending will be set,
+ * to be handled later in the same invocation of ProcessInterrupts().
+ */
+static void
+ProcessRecoveryConflictInterrupt(ProcSignalReason reason)
+{
/*
* Don't joggle the elbow of proc_exit
*/
if (!proc_exit_inprogress)
{
-		RecoveryConflictReason = reason;
switch (reason)
{
case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
@@ -3084,9 +3094,9 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
if (IsAbortedTransactionBlockState())
return;
+ RecoveryConflictReason = reason;
RecoveryConflictPending = true;
QueryCancelPending = true;
- InterruptPending = true;
break;
}

@@ -3094,9 +3104,9 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
/* FALLTHROUGH */

case PROCSIG_RECOVERY_CONFLICT_DATABASE:
+ RecoveryConflictReason = reason;
RecoveryConflictPending = true;
ProcDiePending = true;
- InterruptPending = true;
break;

default:
@@ -3115,15 +3125,6 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
RecoveryConflictRetryable = false;
}

It's pretty weird that we have all this stuff that we then just check a short
while later in ProcessInterrupts() whether they've been set.

Seems like it'd make more sense to throw the error in
ProcessRecoveryConflictInterrupt(), now that it's not in a a signal handler
anymore?

/*
@@ -3147,6 +3148,22 @@ ProcessInterrupts(void)
return;
InterruptPending = false;

+	/*
+	 * Have we been asked to check for a recovery conflict?  Processing these
+	 * interrupts may result in RecoveryConflictPending and related variables
+	 * being set, to be handled further down.
+	 */
+	for (int i = PROCSIG_RECOVERY_CONFLICT_FIRST;
+		 i <= PROCSIG_RECOVERY_CONFLICT_LAST;
+		 ++i)
+	{
+		if (RecoveryConflictInterruptPending[i])
+		{
+			RecoveryConflictInterruptPending[i] = false;
+			ProcessRecoveryConflictInterrupt(i);
+		}
+	}

Hm. This seems like it shouldn't be in ProcessInterrupts(). How about checking
calling a wrapper doing all this if RecoveryConflictPending?

Greetings,

Andres Freund

thomas.munro@gmail.com

over 3 years ago

In reply to: Andres Freund (#6)

1 attachment(s)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Tue, Apr 12, 2022 at 10:50 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-04-12 10:33:28 +1200, Thomas Munro wrote:

Instead of bothering to create N different XXXPending variables for
the different conflict "reasons", I used an array. Other than that,
it's much like existing examples.

It kind of bothers me that each pending conflict has its own external function
call. It doesn't actually cost anything, because it's quite unlikely that
there's more than one pending conflict. Besides aesthetically displeasing,
it also leads to an unnecessarily large amount of code needed, because the
calls to RecoveryConflictInterrupt() can't be merged...

Ok, in this version there's two levels of flag:
RecoveryConflictPending, so we do nothing if that's not set, and then
the loop over RecoveryConflictPendingReasons is moved into
ProcessRecoveryConflictInterrupts(). Better?

What might actually make more sense is to just have a bitmask or something?

Yeah, in fact I'm exploring something like that in later bigger
redesign work[1]/messages/by-id/CA+hUKG+3MkS21yK4jL4cgZywdnnGKiBg0jatoV6kzaniBmcqbQ@mail.gmail.com that gets rid of signal handlers. Here I'm looking
for something simple and potentially back-patchable and I don't want
to have to think about async signal safety of bit-level manipulations.

It's pretty weird that we have all this stuff that we then just check a short
while later in ProcessInterrupts() whether they've been set.

Seems like it'd make more sense to throw the error in
ProcessRecoveryConflictInterrupt(), now that it's not in a a signal handler
anymore?

Yeah. The thing that was putting me off doing that (and caused me to
get kinda stuck in the valley of indecision for a while here, sorry
about that) aside from trying to keep the diff small, was the need to
replicate this self-loathing code in a second place:

if (QueryCancelPending && QueryCancelHoldoffCount != 0)
{
/*
* Re-arm InterruptPending so that we process the cancel request as
* soon as we're done reading the message. (XXX this is seriously
* ugly: it complicates INTERRUPTS_CAN_BE_PROCESSED(), and it means we
* can't use that macro directly as the initial test in this function,
* meaning that this code also creates opportunities for other bugs to
* appear.)
*/

But I have now tried doing that anyway, and I hope the simplification
in other ways makes it worth it. Thoughts, objections?

/*
@@ -3147,6 +3148,22 @@ ProcessInterrupts(void)
return;
InterruptPending = false;

+     /*
+      * Have we been asked to check for a recovery conflict?  Processing these
+      * interrupts may result in RecoveryConflictPending and related variables
+      * being set, to be handled further down.
+      */
+     for (int i = PROCSIG_RECOVERY_CONFLICT_FIRST;
+              i <= PROCSIG_RECOVERY_CONFLICT_LAST;
+              ++i)
+     {
+             if (RecoveryConflictInterruptPending[i])
+             {
+                     RecoveryConflictInterruptPending[i] = false;
+                     ProcessRecoveryConflictInterrupt(i);
+             }
+     }

Hm. This seems like it shouldn't be in ProcessInterrupts(). How about checking
calling a wrapper doing all this if RecoveryConflictPending?

I moved the loop into ProcessRecoveryConflictInterrupt() and added an
"s" to the latter's name. It already had the right indentation level
to contain a loop, once I realised that the test of
proc_exit_inprogress must be redundant.

Better?

[1]: /messages/by-id/CA+hUKG+3MkS21yK4jL4cgZywdnnGKiBg0jatoV6kzaniBmcqbQ@mail.gmail.com

Attachments:

v2-0001-Fix-recovery-conflict-SIGUSR1-handling.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Fix-recovery-conflict-SIGUSR1-handling.patchDownload

From fc9b7c1c68404eede7161615e6a7b5ac2155d0ba Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 10 May 2022 16:00:23 +1200
Subject: [PATCH v2] Fix recovery conflict SIGUSR1 handling.

We shouldn't be doing real work in a signal handler, to avoid reaching
code that is not safe in that context.  Move the check into the next
CFI.

Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/storage/ipc/procsignal.c |  12 +-
 src/backend/tcop/postgres.c          | 191 ++++++++++++++-------------
 src/include/storage/procsignal.h     |   4 +-
 src/include/tcop/tcopprot.h          |   3 +-
 4 files changed, 107 insertions(+), 103 deletions(-)

diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 00d66902d8..1268eeba7c 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -644,22 +644,22 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 		HandleLogMemoryContextInterrupt();
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_TABLESPACE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
 	SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 304cce135a..324ca816e1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -171,10 +171,9 @@ static const char *userDoption = NULL;	/* -D switch */
 static bool EchoQuery = false;	/* -E switch */
 static bool UseSemiNewlineNewline = false;	/* -j switch */
 
-/* whether or not, and why, we were canceled by conflict with recovery */
-static bool RecoveryConflictPending = false;
-static bool RecoveryConflictRetryable = true;
-static ProcSignalReason RecoveryConflictReason;
+/* whether or not, and why, we were cancelled by conflict with recovery */
+static volatile sig_atomic_t RecoveryConflictPending = false;
+static volatile sig_atomic_t RecoveryConflictPendingReasons[NUM_PROCSIGNALS];
 
 /* reused buffer to pass to SendRowDescriptionMessage() */
 static MemoryContext row_description_context = NULL;
@@ -193,7 +192,6 @@ static bool check_log_statement(List *stmt_list);
 static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
-static int	errdetail_recovery_conflict(void);
 static void bind_param_error_callback(void *arg);
 static void start_xact_command(void);
 static void finish_xact_command(void);
@@ -2465,9 +2463,9 @@ errdetail_abort(void)
  * Add an errdetail() line showing conflict source.
  */
 static int
-errdetail_recovery_conflict(void)
+errdetail_recovery_conflict(ProcSignalReason reason)
 {
-	switch (RecoveryConflictReason)
+	switch (reason)
 	{
 		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 			errdetail("User was holding shared buffer pin for too long.");
@@ -2992,22 +2990,46 @@ FloatExceptionHandler(SIGNAL_ARGS)
 }
 
 /*
- * RecoveryConflictInterrupt: out-of-line portion of recovery conflict
- * handling following receipt of SIGUSR1. Designed to be similar to die()
- * and StatementCancelHandler(). Called only by a normal user backend
- * that begins a transaction during recovery.
+ * Tell the next CHECK_FOR_INTERRUPTS() to check for a particular type of
+ * recovery conflict.  Runs in a SIGUSR1 handler.
  */
 void
-RecoveryConflictInterrupt(ProcSignalReason reason)
+HandleRecoveryConflictInterrupt(ProcSignalReason reason)
 {
-	int			save_errno = errno;
+	RecoveryConflictPendingReasons[reason] = true;
+	RecoveryConflictPending = true;
+	InterruptPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
+
+/*
+ * Check individual recovery conflict reasons.  Called when
+ * RecoveryConflictPending is set.
+ */
+static void
+ProcessRecoveryConflictInterrupts(void)
+{
+	ProcSignalReason reason;
 
 	/*
-	 * Don't joggle the elbow of proc_exit
+	 * We don't need to worry about joggling the elbow of proc_exit, because
+	 * proc_exit_prepare() holds interrupts, so ProcessInterrupts() won't call
+	 * us.
 	 */
-	if (!proc_exit_inprogress)
+	Assert(!proc_exit_inprogress);
+	Assert(InterruptHoldoffCount == 0);
+	Assert(RecoveryConflictPending);
+
+	RecoveryConflictPending = false;
+
+	for (reason = PROCSIG_RECOVERY_CONFLICT_FIRST;
+		 reason <= PROCSIG_RECOVERY_CONFLICT_LAST;
+		 reason++)
 	{
-		RecoveryConflictReason = reason;
+		if (!RecoveryConflictPendingReasons[reason])
+			continue;
+		RecoveryConflictPendingReasons[reason] = false;
+
 		switch (reason)
 		{
 			case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
@@ -3016,7 +3038,7 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 				 * If we aren't waiting for a lock we can never deadlock.
 				 */
 				if (!IsWaitingForLock())
-					return;
+					continue;
 
 				/* Intentional fall through to check wait for pin */
 				/* FALLTHROUGH */
@@ -3039,7 +3061,7 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 					if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
 						GetStartupBufferPinWaitBufId() < 0)
 						CheckDeadLockAlert();
-					return;
+					continue;
 				}
 
 				MyProc->recoveryConflictPending = true;
@@ -3055,7 +3077,7 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 				 * If we aren't in a transaction any longer then ignore.
 				 */
 				if (!IsTransactionOrTransactionBlock())
-					return;
+					continue;
 
 				/*
 				 * If we can abort just the current subtransaction then we are
@@ -3081,11 +3103,47 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 					 * subtransactions, which must cause FATAL, currently.
 					 */
 					if (IsAbortedTransactionBlockState())
-						return;
+						continue;
+
+					/*
+					 * If a recovery conflict happens while we are waiting for
+					 * input from the client, the client is presumably just
+					 * sitting idle in a transaction, preventing recovery from
+					 * making progress.  Terminate the connection to dislodge
+					 * it.
+					 */
+					if (DoingCommandRead)
+					{
+						LockErrorCleanup();
+						pgstat_report_recovery_conflict(reason);
+						ereport(FATAL,
+								(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+								 errmsg("terminating connection due to conflict with recovery"),
+								 errdetail_recovery_conflict(reason),
+								 errhint("In a moment you should be able to reconnect to the"
+										 " database and repeat your command.")));
+					}
 
-					RecoveryConflictPending = true;
-					QueryCancelPending = true;
-					InterruptPending = true;
+					/* Avoid losing sync in the FE/BE protocol. */
+					if (QueryCancelHoldoffCount != 0)
+					{
+						/*
+						 * Re-arm and defer this interrupt until later.  See
+						 * similar code in ProcessInterrupts().
+						 */
+						RecoveryConflictPendingReasons[reason] = true;
+						RecoveryConflictPending = true;
+						InterruptPending = true;
+						continue;
+					}
+
+					/* We can use ERROR, because this conflict is retryable. */
+					LockErrorCleanup();
+					pgstat_report_recovery_conflict(reason);
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("canceling statement due to conflict with recovery"),
+							 errdetail_recovery_conflict(reason)));
 					break;
 				}
 
@@ -3093,36 +3151,24 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 				/* FALLTHROUGH */
 
 			case PROCSIG_RECOVERY_CONFLICT_DATABASE:
-				RecoveryConflictPending = true;
-				ProcDiePending = true;
-				InterruptPending = true;
+				pgstat_report_recovery_conflict(reason);
+				if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
+					ereport(FATAL,
+							(errcode(ERRCODE_DATABASE_DROPPED),
+							 errmsg("terminating connection due to conflict with recovery"),
+							 errdetail_recovery_conflict(reason)));
+				else
+					ereport(FATAL,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("terminating connection due to conflict with recovery"),
+							 errdetail_recovery_conflict(reason)));
 				break;
 
 			default:
 				elog(FATAL, "unrecognized conflict mode: %d",
 					 (int) reason);
 		}
-
-		Assert(RecoveryConflictPending && (QueryCancelPending || ProcDiePending));
-
-		/*
-		 * All conflicts apart from database cause dynamic errors where the
-		 * command or transaction can be retried at a later point with some
-		 * potential for success. No need to reset this, since non-retryable
-		 * conflict errors are currently FATAL.
-		 */
-		if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
-			RecoveryConflictRetryable = false;
 	}
-
-	/*
-	 * Set the process latch. This function essentially emulates signal
-	 * handlers like die() and StatementCancelHandler() and it seems prudent
-	 * to behave similarly as they do.
-	 */
-	SetLatch(MyLatch);
-
-	errno = save_errno;
 }
 
 /*
@@ -3146,6 +3192,9 @@ ProcessInterrupts(void)
 		return;
 	InterruptPending = false;
 
+	if (RecoveryConflictPending)
+		ProcessRecoveryConflictInterrupts();
+
 	if (ProcDiePending)
 	{
 		ProcDiePending = false;
@@ -3177,24 +3226,6 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
-		else if (RecoveryConflictPending && RecoveryConflictRetryable)
-		{
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
-		else if (RecoveryConflictPending)
-		{
-			/* Currently there is only one non-retryable recovery conflict */
-			Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_DATABASE_DROPPED),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 		else if (IsBackgroundWorker)
 			ereport(FATAL,
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
@@ -3237,31 +3268,13 @@ ProcessInterrupts(void)
 				 errmsg("connection to client lost")));
 	}
 
-	/*
-	 * If a recovery conflict happens while we are waiting for input from the
-	 * client, the client is presumably just sitting idle in a transaction,
-	 * preventing recovery from making progress.  Terminate the connection to
-	 * dislodge it.
-	 */
-	if (RecoveryConflictPending && DoingCommandRead)
-	{
-		QueryCancelPending = false; /* this trumps QueryCancel */
-		RecoveryConflictPending = false;
-		LockErrorCleanup();
-		pgstat_report_recovery_conflict(RecoveryConflictReason);
-		ereport(FATAL,
-				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-				 errmsg("terminating connection due to conflict with recovery"),
-				 errdetail_recovery_conflict(),
-				 errhint("In a moment you should be able to reconnect to the"
-						 " database and repeat your command.")));
-	}
-
 	/*
 	 * Don't allow query cancel interrupts while reading input from the
 	 * client, because we might lose sync in the FE/BE protocol.  (Die
 	 * interrupts are OK, because we won't read any further messages from the
 	 * client in that case.)
+	 *
+	 * See similar logic in ProcessRecoveryConflictInterrupts().
 	 */
 	if (QueryCancelPending && QueryCancelHoldoffCount != 0)
 	{
@@ -3320,16 +3333,6 @@ ProcessInterrupts(void)
 					(errcode(ERRCODE_QUERY_CANCELED),
 					 errmsg("canceling autovacuum task")));
 		}
-		if (RecoveryConflictPending)
-		{
-			RecoveryConflictPending = false;
-			LockErrorCleanup();
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(ERROR,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("canceling statement due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 
 		/*
 		 * If we are reading a command from the client, just ignore the cancel
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index ee636900f3..26d045950c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -37,12 +37,14 @@ typedef enum
 	PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
 
 	/* Recovery conflict reasons */
-	PROCSIG_RECOVERY_CONFLICT_DATABASE,
+	PROCSIG_RECOVERY_CONFLICT_FIRST,
+	PROCSIG_RECOVERY_CONFLICT_DATABASE = PROCSIG_RECOVERY_CONFLICT_FIRST,
 	PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
 	PROCSIG_RECOVERY_CONFLICT_LOCK,
 	PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+	PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 87e408b719..6c3c91aeea 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -73,8 +73,7 @@ extern void die(SIGNAL_ARGS);
 extern void quickdie(SIGNAL_ARGS) pg_attribute_noreturn();
 extern void StatementCancelHandler(SIGNAL_ARGS);
 extern void FloatExceptionHandler(SIGNAL_ARGS) pg_attribute_noreturn();
-extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
-																 * handler */
+extern void HandleRecoveryConflictInterrupt(ProcSignalReason reason);
 extern void ProcessClientReadInterrupt(bool blocked);
 extern void ProcessClientWriteInterrupt(bool blocked);
 
-- 
2.30.2

andres@anarazel.de

over 3 years ago

In reply to: Thomas Munro (#7)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Hi,

It'd be cool to commit and backpatch this - I'd like to re-enable the conflict
tests in the backbranches, and I don't think we want to do so with this issue
in place.

On 2022-05-10 16:39:11 +1200, Thomas Munro wrote:

On Tue, Apr 12, 2022 at 10:50 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-04-12 10:33:28 +1200, Thomas Munro wrote:

Instead of bothering to create N different XXXPending variables for
the different conflict "reasons", I used an array. Other than that,
it's much like existing examples.

It kind of bothers me that each pending conflict has its own external function
call. It doesn't actually cost anything, because it's quite unlikely that
there's more than one pending conflict. Besides aesthetically displeasing,
it also leads to an unnecessarily large amount of code needed, because the
calls to RecoveryConflictInterrupt() can't be merged...

Ok, in this version there's two levels of flag:
RecoveryConflictPending, so we do nothing if that's not set, and then
the loop over RecoveryConflictPendingReasons is moved into
ProcessRecoveryConflictInterrupts(). Better?

I think so.

I don't particularly like the Handle/ProcessRecoveryConflictInterrupt() split,
naming-wise. I don't think Handle vs Process indicates something meaningful?
Maybe s/Interrupt/Signal/ for the signal handler one could help?

It *might* look a tad cleaner to have the loop in a separate function from the
existing code. I.e. a +ProcessRecoveryConflictInterrupts() that calls
ProcessRecoveryConflictInterrupts().

What might actually make more sense is to just have a bitmask or something?

Yeah, in fact I'm exploring something like that in later bigger
redesign work[1] that gets rid of signal handlers. Here I'm looking
for something simple and potentially back-patchable and I don't want
to have to think about async signal safety of bit-level manipulations.

Makes sense.

/*
@@ -3146,6 +3192,9 @@ ProcessInterrupts(void)
return;
InterruptPending = false;
+	if (RecoveryConflictPending)
+		ProcessRecoveryConflictInterrupts();
+
if (ProcDiePending)
{
ProcDiePending = false;

Should the ProcessRecoveryConflictInterrupts() call really be before the
ProcDiePending check?

Greetings,

Andres Freund

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Andres Freund (#8)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Sun, May 22, 2022 at 05:10:01PM -0700, Andres Freund wrote:

On 2022-05-10 16:39:11 +1200, Thomas Munro wrote:

Ok, in this version there's two levels of flag:
RecoveryConflictPending, so we do nothing if that's not set, and then
the loop over RecoveryConflictPendingReasons is moved into
ProcessRecoveryConflictInterrupts(). Better?

I think so.

I don't particularly like the Handle/ProcessRecoveryConflictInterrupt() split,
naming-wise. I don't think Handle vs Process indicates something meaningful?
Maybe s/Interrupt/Signal/ for the signal handler one could help?

Handle is more consistent with the other types of interruptions in the
SIGUSR1 handler, so the name proposed in the patch in not that
confusing to me. And so does Process, in spirit with
ProcessProcSignalBarrier() and ProcessLogMemoryContextInterrupt().
While on it, is postgres.c the best home for
HandleRecoveryConflictInterrupt()? That's a very generic file, for
starters. Not related to the actual bug, just asking.

It *might* look a tad cleaner to have the loop in a separate function from the
existing code. I.e. a +ProcessRecoveryConflictInterrupts() that calls
ProcessRecoveryConflictInterrupts().

Agreed that it would be a bit cleaner to keep the internals in a
different routine.

Yeah, in fact I'm exploring something like that in later bigger
redesign work[1] that gets rid of signal handlers. Here I'm looking
for something simple and potentially back-patchable and I don't want
to have to think about async signal safety of bit-level manipulations.

Makes sense.

+1.

Also note that bufmgr.c mentions RecoveryConflictInterrupt() in the
top comment of HoldingBufferPinThatDelaysRecovery().

Should the processing of PROCSIG_RECOVERY_CONFLICT_DATABASE mention
that FATAL is used because we are never going to retry the conflict as
the database has been dropped? Getting rid of
RecoveryConflictRetryable makes the code easier to go through.
--
Michael

robertmhaas@gmail.com

over 3 years ago

In reply to: Michael Paquier (#9)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Wed, Jun 15, 2022 at 1:51 AM Michael Paquier <michael@paquier.xyz> wrote:

Handle is more consistent with the other types of interruptions in the
SIGUSR1 handler, so the name proposed in the patch in not that
confusing to me. And so does Process, in spirit with
ProcessProcSignalBarrier() and ProcessLogMemoryContextInterrupt().
While on it, is postgres.c the best home for
HandleRecoveryConflictInterrupt()? That's a very generic file, for
starters. Not related to the actual bug, just asking.

Yeah, there's existing precedent for this kind of split in, for
example, HandleCatchupInterrupt() and ProcessCatchupInterrupt(). I
think the idea is that "process" is supposed to sound like the more
involved part of the operation, whereas "handle" is supposed to sound
like the initial response to the signal.

I'm not sure it's the clearest possible naming, but naming things is
hard, and this patch is apparently not inventing a new way to do it.

--
Robert Haas
EDB: http://www.enterprisedb.com

thomas.munro@gmail.com

over 3 years ago

In reply to: Robert Haas (#10)

1 attachment(s)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Here's a new version, but there's something wrong that I haven't
figured out yet (see CI link below).

Here's one thing I got a bit confused about along the way, but it
seems the comment was just wrong:

+                       /*
+                        * If we can abort just the current
subtransaction then we are OK
+                        * to throw an ERROR to resolve the conflict.
Otherwise drop
+                        * through to the FATAL case.
...
+                       if (!IsSubTransaction())
...
+                               ereport(ERROR,

Surely this was meant to say, "If we're not in a subtransaction then
...", right? Changed.

I thought of a couple more simplifications for the big switch
statement in ProcessRecoveryConflictInterrupt(). The special case for
DoingCommandRead can be changed to fall through, instead of needing a
separate ereport(FATAL).

I also folded the two ereport(FATAL) calls in the CONFLICT_DATABASE
case into one, since they differ only in errcode().

+                                       (errcode(reason ==
PROCSIG_RECOVERY_CONFLICT_DATABASE ?
+
ERRCODE_DATABASE_DROPPED :
+
ERRCODE_T_R_SERIALIZATION_FAILURE),

Now we're down to just one ereport(FATAL), one ereport(ERROR), and a
couple of ways to give up without erroring. I think this makes the
logic a lot easier to follow?

I'm confused about proc->recoveryConflictPending: the startup process
sets it, and sometimes the interrupt receiver sets it too, and it
causes errdetail() to be clobbered on abort (for any reason), even
though we bothered to set it carefully for the recovery conflict
ereport calls. Or something like that. I haven't changed anything
about that in this patch, though.

Problem: I saw 031_recovery_conflict.pl time out while waiting for a
buffer pin conflict, but so far once only, on CI:

https://cirrus-ci.com/task/5956804860444672

timed out waiting for match: (?^:User was holding shared buffer pin
for too long) at t/031_recovery_conflict.pl line 367.

Hrmph. Still trying to reproduce that, which may be a bug in this
patch, a bug in the test or a pre-existing problem. Note that
recovery didn't say something like:

2022-06-21 17:05:40.931 NZST [57674] LOG: recovery still waiting
after 11.197 ms: recovery conflict on buffer pin

(That's what I'd expect to see in
https://api.cirrus-ci.com/v1/artifact/task/5956804860444672/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log
if the startup process had decided to send the signal).

... so it seems like the problem in that run is upstream of the interrupt stuff.

Other things changed in response to feedback (quoting from several
recent messages):

On Thu, Jun 16, 2022 at 5:01 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jun 15, 2022 at 1:51 AM Michael Paquier <michael@paquier.xyz> wrote:

Handle is more consistent with the other types of interruptions in the
SIGUSR1 handler, so the name proposed in the patch in not that
confusing to me. And so does Process, in spirit with
ProcessProcSignalBarrier() and ProcessLogMemoryContextInterrupt().
While on it, is postgres.c the best home for
HandleRecoveryConflictInterrupt()? That's a very generic file, for
starters. Not related to the actual bug, just asking.

Yeah, there's existing precedent for this kind of split in, for
example, HandleCatchupInterrupt() and ProcessCatchupInterrupt(). I
think the idea is that "process" is supposed to sound like the more
involved part of the operation, whereas "handle" is supposed to sound
like the initial response to the signal.

Thanks both for looking. Yeah, I was trying to keep with the existing
convention here (though admittedly we're not 100% consistent on this,
something to tidy up separately perhaps).

On Wed, Jun 15, 2022 at 5:51 PM Michael Paquier <michael@paquier.xyz> wrote:

On Sun, May 22, 2022 at 05:10:01PM -0700, Andres Freund wrote:

It *might* look a tad cleaner to have the loop in a separate function from the
existing code. I.e. a +ProcessRecoveryConflictInterrupts() that calls
ProcessRecoveryConflictInterrupts().

Agreed that it would be a bit cleaner to keep the internals in a
different routine.

Alright, I split it into two functions: one with an 's' in the name to
do the looping, and one without 's' to process an individual interrupt
reason. Makes the patch harder to read because the indentation level
changes...

Also note that bufmgr.c mentions RecoveryConflictInterrupt() in the
top comment of HoldingBufferPinThatDelaysRecovery().

Fixed.

Should the processing of PROCSIG_RECOVERY_CONFLICT_DATABASE mention
that FATAL is used because we are never going to retry the conflict as
the database has been dropped?

OK, note added.

Getting rid of
RecoveryConflictRetryable makes the code easier to go through.

Yeah, all the communication through global variables was really
confusing, and also subtly wrong (that global reason gets clobbered
with incorrect values), and that retryable variable was hard to
follow.

On Mon, May 23, 2022 at 12:10 PM Andres Freund <andres@anarazel.de> wrote:

On 2022-05-10 16:39:11 +1200, Thomas Munro wrote:
@@ -3146,6 +3192,9 @@ ProcessInterrupts(void)
return;
InterruptPending = false;
+     if (RecoveryConflictPending)
+             ProcessRecoveryConflictInterrupts();
+
if (ProcDiePending)
{
ProcDiePending = false;
Should the ProcessRecoveryConflictInterrupts() call really be before the
ProcDiePending check?

I don't think it's important which of (say) a statement timeout and a
recovery conflict that arrive around the same time takes priority, but
on reflection it was an ugly place to put it, and it seems tidier to
move it down the function a bit further, where other various special
interrupts are handled after the "main" and original die/cancel ones.

Attachments:

v3-0001-Fix-recovery-conflict-SIGUSR1-handling.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Fix-recovery-conflict-SIGUSR1-handling.patchDownload

From a3b378b4aff40546062ac073f5c14c661e7296e5 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 10 May 2022 16:00:23 +1200
Subject: [PATCH v3] Fix recovery conflict SIGUSR1 handling.

We shouldn't be doing real work in a signal handler, to avoid reaching
code that is not safe in that context.  Move all recovery conflict
checking logic into the next CFI following the standard pattern.

Back-patch to all supported releases.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/storage/buffer/bufmgr.c  |   4 +-
 src/backend/storage/ipc/procsignal.c |  12 +-
 src/backend/tcop/postgres.c          | 312 ++++++++++++++-------------
 src/include/storage/procsignal.h     |   4 +-
 src/include/tcop/tcopprot.h          |   3 +-
 5 files changed, 172 insertions(+), 163 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index ae13011d27..50db212e15 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4357,8 +4357,8 @@ LockBufferForCleanup(Buffer buffer)
 }
 
 /*
- * Check called from RecoveryConflictInterrupt handler when Startup
- * process requests cancellation of all pin holders that are blocking it.
+ * Check called from ProcessRecoveryConflictInterrupts() when Startup process
+ * requests cancellation of all pin holders that are blocking it.
  */
 bool
 HoldingBufferPinThatDelaysRecovery(void)
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 21a9fc0fdd..c6fedd94ba 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -658,22 +658,22 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 		HandleLogMemoryContextInterrupt();
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_TABLESPACE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
 	SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8b6b5bbaaa..409911eada 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -172,9 +172,8 @@ static bool EchoQuery = false;	/* -E switch */
 static bool UseSemiNewlineNewline = false;	/* -j switch */
 
 /* whether or not, and why, we were canceled by conflict with recovery */
-static bool RecoveryConflictPending = false;
-static bool RecoveryConflictRetryable = true;
-static ProcSignalReason RecoveryConflictReason;
+static volatile sig_atomic_t RecoveryConflictPending = false;
+static volatile sig_atomic_t RecoveryConflictPendingReasons[NUM_PROCSIGNALS];
 
 /* reused buffer to pass to SendRowDescriptionMessage() */
 static MemoryContext row_description_context = NULL;
@@ -193,7 +192,6 @@ static bool check_log_statement(List *stmt_list);
 static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
-static int	errdetail_recovery_conflict(void);
 static void bind_param_error_callback(void *arg);
 static void start_xact_command(void);
 static void finish_xact_command(void);
@@ -2465,9 +2463,9 @@ errdetail_abort(void)
  * Add an errdetail() line showing conflict source.
  */
 static int
-errdetail_recovery_conflict(void)
+errdetail_recovery_conflict(ProcSignalReason reason)
 {
-	switch (RecoveryConflictReason)
+	switch (reason)
 	{
 		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 			errdetail("User was holding shared buffer pin for too long.");
@@ -2992,137 +2990,190 @@ FloatExceptionHandler(SIGNAL_ARGS)
 }
 
 /*
- * RecoveryConflictInterrupt: out-of-line portion of recovery conflict
- * handling following receipt of SIGUSR1. Designed to be similar to die()
- * and StatementCancelHandler(). Called only by a normal user backend
- * that begins a transaction during recovery.
+ * Tell the next CHECK_FOR_INTERRUPTS() to check for a particular type of
+ * recovery conflict.  Runs in a SIGUSR1 handler.
  */
 void
-RecoveryConflictInterrupt(ProcSignalReason reason)
+HandleRecoveryConflictInterrupt(ProcSignalReason reason)
 {
-	int			save_errno = errno;
+	RecoveryConflictPendingReasons[reason] = true;
+	RecoveryConflictPending = true;
+	InterruptPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
 
-	/*
-	 * Don't joggle the elbow of proc_exit
-	 */
-	if (!proc_exit_inprogress)
+/*
+ * Check one individual conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupt(ProcSignalReason reason)
+{
+	switch (reason)
 	{
-		RecoveryConflictReason = reason;
-		switch (reason)
-		{
-			case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
+		case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
 
-				/*
-				 * If we aren't waiting for a lock we can never deadlock.
-				 */
-				if (!IsWaitingForLock())
-					return;
+			/*
+			 * If we aren't waiting for a lock we can never deadlock.
+			 */
+			if (!IsWaitingForLock())
+				return;
 
-				/* Intentional fall through to check wait for pin */
-				/* FALLTHROUGH */
+			/* Intentional fall through to check wait for pin */
+			/* FALLTHROUGH */
 
-			case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
+		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 
-				/*
-				 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
-				 * aren't blocking the Startup process there is nothing more
-				 * to do.
-				 *
-				 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is
-				 * requested, if we're waiting for locks and the startup
-				 * process is not waiting for buffer pin (i.e., also waiting
-				 * for locks), we set the flag so that ProcSleep() will check
-				 * for deadlocks.
-				 */
-				if (!HoldingBufferPinThatDelaysRecovery())
-				{
-					if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
-						GetStartupBufferPinWaitBufId() < 0)
-						CheckDeadLockAlert();
-					return;
-				}
+			/*
+			 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
+			 * aren't blocking the Startup process there is nothing more to
+			 * do.
+			 *
+			 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is requested,
+			 * if we're waiting for locks and the startup process is not
+			 * waiting for buffer pin (i.e., also waiting for locks), we set
+			 * the flag so that ProcSleep() will check for deadlocks.
+			 */
+			if (!HoldingBufferPinThatDelaysRecovery())
+			{
+				if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
+					GetStartupBufferPinWaitBufId() < 0)
+					CheckDeadLockAlert();
+				return;
+			}
 
-				MyProc->recoveryConflictPending = true;
+			MyProc->recoveryConflictPending = true;
 
-				/* Intentional fall through to error handling */
-				/* FALLTHROUGH */
+			/* Intentional fall through to error handling */
+			/* FALLTHROUGH */
+
+		case PROCSIG_RECOVERY_CONFLICT_LOCK:
+		case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
+		case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
 
-			case PROCSIG_RECOVERY_CONFLICT_LOCK:
-			case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
-			case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+			/*
+			 * If we aren't in a transaction any longer then ignore.
+			 */
+			if (!IsTransactionOrTransactionBlock())
+				return;
 
+			/*
+			 * If we're not in a subtransaction then we are OK to throw an
+			 * ERROR to resolve the conflict.  Otherwise drop through to the
+			 * FATAL case.
+			 *
+			 * XXX other times that we can throw just an ERROR *may* be
+			 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in parent
+			 * transactions
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held by
+			 * parent transactions and the transaction is not
+			 * transaction-snapshot mode
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
+			 * cursors open in parent transactions
+			 */
+			if (!IsSubTransaction())
+			{
 				/*
-				 * If we aren't in a transaction any longer then ignore.
+				 * If we already aborted then we no longer need to cancel.  We
+				 * do this here since we do not wish to ignore aborted
+				 * subtransactions, which must cause FATAL, currently.
 				 */
-				if (!IsTransactionOrTransactionBlock())
+				if (IsAbortedTransactionBlockState())
 					return;
 
 				/*
-				 * If we can abort just the current subtransaction then we are
-				 * OK to throw an ERROR to resolve the conflict. Otherwise
-				 * drop through to the FATAL case.
-				 *
-				 * XXX other times that we can throw just an ERROR *may* be
-				 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in
-				 * parent transactions
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held
-				 * by parent transactions and the transaction is not
-				 * transaction-snapshot mode
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
-				 * cursors open in parent transactions
+				 * If a recovery conflict happens while we are waiting for
+				 * input from the client, the client is presumably just
+				 * sitting idle in a transaction, preventing recovery from
+				 * making progress.  We'll drop through to the FATAL case
+				 * below to dislodge it, in that case.
 				 */
-				if (!IsSubTransaction())
+				if (!DoingCommandRead)
 				{
-					/*
-					 * If we already aborted then we no longer need to cancel.
-					 * We do this here since we do not wish to ignore aborted
-					 * subtransactions, which must cause FATAL, currently.
-					 */
-					if (IsAbortedTransactionBlockState())
+					/* Avoid losing sync in the FE/BE protocol. */
+					if (QueryCancelHoldoffCount != 0)
+					{
+						/*
+						 * Re-arm and defer this interrupt until later.  See
+						 * similar code in ProcessInterrupts().
+						 */
+						RecoveryConflictPendingReasons[reason] = true;
+						RecoveryConflictPending = true;
+						InterruptPending = true;
 						return;
+					}
 
-					RecoveryConflictPending = true;
-					QueryCancelPending = true;
-					InterruptPending = true;
+					/*
+					 * We are cleared to throw an ERROR.  We have a top-level
+					 * transaction that we can abort and a conflict that isn't
+					 * inherently non-retryable.
+					 */
+					LockErrorCleanup();
+					pgstat_report_recovery_conflict(reason);
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("canceling statement due to conflict with recovery"),
+							 errdetail_recovery_conflict(reason)));
 					break;
 				}
+			}
 
-				/* Intentional fall through to session cancel */
-				/* FALLTHROUGH */
-
-			case PROCSIG_RECOVERY_CONFLICT_DATABASE:
-				RecoveryConflictPending = true;
-				ProcDiePending = true;
-				InterruptPending = true;
-				break;
+			/* Intentional fall through to session cancel */
+			/* FALLTHROUGH */
 
-			default:
-				elog(FATAL, "unrecognized conflict mode: %d",
-					 (int) reason);
-		}
+		case PROCSIG_RECOVERY_CONFLICT_DATABASE:
 
-		Assert(RecoveryConflictPending && (QueryCancelPending || ProcDiePending));
+			/*
+			 * Retrying is not possible because the database is dropped, or we
+			 * decided above that we couldn't resolve the conflict with an
+			 * ERROR and fell through.  Terminate the session.
+			 */
+			pgstat_report_recovery_conflict(reason);
+			ereport(FATAL,
+					(errcode(reason == PROCSIG_RECOVERY_CONFLICT_DATABASE ?
+							 ERRCODE_DATABASE_DROPPED :
+							 ERRCODE_T_R_SERIALIZATION_FAILURE),
+					 errmsg("terminating connection due to conflict with recovery"),
+					 errdetail_recovery_conflict(reason),
+					 errhint("In a moment you should be able to reconnect to the"
+							 " database and repeat your command.")));
+			break;
 
-		/*
-		 * All conflicts apart from database cause dynamic errors where the
-		 * command or transaction can be retried at a later point with some
-		 * potential for success. No need to reset this, since non-retryable
-		 * conflict errors are currently FATAL.
-		 */
-		if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
-			RecoveryConflictRetryable = false;
+		default:
+			elog(FATAL, "unrecognized conflict mode: %d", (int) reason);
 	}
+}
+
+/*
+ * Check each possible recovery conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupts(void)
+{
+	ProcSignalReason reason;
 
 	/*
-	 * Set the process latch. This function essentially emulates signal
-	 * handlers like die() and StatementCancelHandler() and it seems prudent
-	 * to behave similarly as they do.
+	 * We don't need to worry about joggling the elbow of proc_exit, because
+	 * proc_exit_prepare() holds interrupts, so ProcessInterrupts() won't call
+	 * us.
 	 */
-	SetLatch(MyLatch);
+	Assert(!proc_exit_inprogress);
+	Assert(InterruptHoldoffCount == 0);
+	Assert(RecoveryConflictPending);
 
-	errno = save_errno;
+	RecoveryConflictPending = false;
+
+	for (reason = PROCSIG_RECOVERY_CONFLICT_FIRST;
+		 reason <= PROCSIG_RECOVERY_CONFLICT_LAST;
+		 reason++)
+	{
+		if (RecoveryConflictPendingReasons[reason])
+		{
+			RecoveryConflictPendingReasons[reason] = false;
+			ProcessRecoveryConflictInterrupt(reason);
+		}
+	}
 }
 
 /*
@@ -3177,24 +3228,6 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
-		else if (RecoveryConflictPending && RecoveryConflictRetryable)
-		{
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
-		else if (RecoveryConflictPending)
-		{
-			/* Currently there is only one non-retryable recovery conflict */
-			Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_DATABASE_DROPPED),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 		else if (IsBackgroundWorker)
 			ereport(FATAL,
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
@@ -3237,31 +3270,13 @@ ProcessInterrupts(void)
 				 errmsg("connection to client lost")));
 	}
 
-	/*
-	 * If a recovery conflict happens while we are waiting for input from the
-	 * client, the client is presumably just sitting idle in a transaction,
-	 * preventing recovery from making progress.  Terminate the connection to
-	 * dislodge it.
-	 */
-	if (RecoveryConflictPending && DoingCommandRead)
-	{
-		QueryCancelPending = false; /* this trumps QueryCancel */
-		RecoveryConflictPending = false;
-		LockErrorCleanup();
-		pgstat_report_recovery_conflict(RecoveryConflictReason);
-		ereport(FATAL,
-				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-				 errmsg("terminating connection due to conflict with recovery"),
-				 errdetail_recovery_conflict(),
-				 errhint("In a moment you should be able to reconnect to the"
-						 " database and repeat your command.")));
-	}
-
 	/*
 	 * Don't allow query cancel interrupts while reading input from the
 	 * client, because we might lose sync in the FE/BE protocol.  (Die
 	 * interrupts are OK, because we won't read any further messages from the
 	 * client in that case.)
+	 *
+	 * See similar logic in ProcessRecoveryConflictInterrupts().
 	 */
 	if (QueryCancelPending && QueryCancelHoldoffCount != 0)
 	{
@@ -3320,16 +3335,6 @@ ProcessInterrupts(void)
 					(errcode(ERRCODE_QUERY_CANCELED),
 					 errmsg("canceling autovacuum task")));
 		}
-		if (RecoveryConflictPending)
-		{
-			RecoveryConflictPending = false;
-			LockErrorCleanup();
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(ERROR,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("canceling statement due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 
 		/*
 		 * If we are reading a command from the client, just ignore the cancel
@@ -3345,6 +3350,9 @@ ProcessInterrupts(void)
 		}
 	}
 
+	if (RecoveryConflictPending)
+		ProcessRecoveryConflictInterrupts();
+
 	if (IdleInTransactionSessionTimeoutPending)
 	{
 		/*
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index ee636900f3..26d045950c 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -37,12 +37,14 @@ typedef enum
 	PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
 
 	/* Recovery conflict reasons */
-	PROCSIG_RECOVERY_CONFLICT_DATABASE,
+	PROCSIG_RECOVERY_CONFLICT_FIRST,
+	PROCSIG_RECOVERY_CONFLICT_DATABASE = PROCSIG_RECOVERY_CONFLICT_FIRST,
 	PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
 	PROCSIG_RECOVERY_CONFLICT_LOCK,
 	PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+	PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 70d9dab25b..9823ee64f7 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -73,8 +73,7 @@ extern void die(SIGNAL_ARGS);
 extern void quickdie(SIGNAL_ARGS) pg_attribute_noreturn();
 extern void StatementCancelHandler(SIGNAL_ARGS);
 extern void FloatExceptionHandler(SIGNAL_ARGS) pg_attribute_noreturn();
-extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
-																 * handler */
+extern void HandleRecoveryConflictInterrupt(ProcSignalReason reason);
 extern void ProcessClientReadInterrupt(bool blocked);
 extern void ProcessClientWriteInterrupt(bool blocked);
 
-- 
2.35.1

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Thomas Munro (#11)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Tue, Jun 21, 2022 at 05:22:05PM +1200, Thomas Munro wrote:

Here's one thing I got a bit confused about along the way, but it
seems the comment was just wrong:
+                       /*
+                        * If we can abort just the current
subtransaction then we are OK
+                        * to throw an ERROR to resolve the conflict.
Otherwise drop
+                        * through to the FATAL case.
...
+                       if (!IsSubTransaction())
...
+                               ereport(ERROR,
Surely this was meant to say, "If we're not in a subtransaction then
...", right? Changed.

Indeed, the code does something else than what the comment says, aka
generating an ERROR if the process is not in a subtransaction,
ignoring already aborted transactions (aborted subtrans go to FATAL)
and throwing a FATAL in the other cases. So your change looks right.

I thought of a couple more simplifications for the big switch
statement in ProcessRecoveryConflictInterrupt(). The special case for
DoingCommandRead can be changed to fall through, instead of needing a
separate ereport(FATAL).

The extra business with QueryCancelHoldoffCount and DoingCommandRead
is the only addition for the snapshot, lock and tablespace conflict
handling part. I don't see why a reason why that could be wrong on a
close lookup. Anyway, why don't you check QueryCancelPending on top
of QueryCancelHoldoffCount?

Now we're down to just one ereport(FATAL), one ereport(ERROR), and a
couple of ways to give up without erroring. I think this makes the
logic a lot easier to follow?

Agreed that it looks like a gain in clarity.
--
Michael

thomas.munro@gmail.com

over 3 years ago

In reply to: Michael Paquier (#12)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Tue, Jun 21, 2022 at 7:44 PM Michael Paquier <michael@paquier.xyz> wrote:

The extra business with QueryCancelHoldoffCount and DoingCommandRead
is the only addition for the snapshot, lock and tablespace conflict
handling part. I don't see why a reason why that could be wrong on a
close lookup. Anyway, why don't you check QueryCancelPending on top
of QueryCancelHoldoffCount?

The idea of this patch is to make ProcessRecoveryConflictInterrupt()
throw its own ERROR, instead of setting QueryCancelPending (as an
earlier version of the patch did). It still has to respect
QueryCancelHoldoffCount, though, to avoid emitting an ERROR at bad
times for the fe/be protocol.

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Thomas Munro (#13)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Tue, Jun 21, 2022 at 11:02:57PM +1200, Thomas Munro wrote:

On Tue, Jun 21, 2022 at 7:44 PM Michael Paquier <michael@paquier.xyz> wrote:

The extra business with QueryCancelHoldoffCount and DoingCommandRead
is the only addition for the snapshot, lock and tablespace conflict
handling part. I don't see why a reason why that could be wrong on a
close lookup. Anyway, why don't you check QueryCancelPending on top
of QueryCancelHoldoffCount?

The idea of this patch is to make ProcessRecoveryConflictInterrupt()
throw its own ERROR, instead of setting QueryCancelPending (as an
earlier version of the patch did). It still has to respect
QueryCancelHoldoffCount, though, to avoid emitting an ERROR at bad
times for the fe/be protocol.

Yeah, I was reading through v3 and my brain questioned the
inconsistency, but I can see that v2 already did that and I have also
looked at it. Anyway, my concern here is that the code becomes more
dependent on the ordering of ProcessRecoveryConflictInterrupt() and
the code path checking for QueryCancelPending in ProcessInterrupts().
With the patch, we should always have QueryCancelPending set to false,
as long as there are no QueryCancelHoldoffCount. Perhaps an extra
assertion for QueryCancelPending could be added at the beginning of
ProcessRecoveryConflictInterrupts(), in combination of the one already
present for InterruptHoldoffCount. I agree that's a minor point,
though.
--
Michael

thomas.munro@gmail.com

over 3 years ago

In reply to: Michael Paquier (#14)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Wed, Jun 22, 2022 at 1:04 PM Michael Paquier <michael@paquier.xyz> wrote:

With the patch, we should always have QueryCancelPending set to false,
as long as there are no QueryCancelHoldoffCount. Perhaps an extra
assertion for QueryCancelPending could be added at the beginning of
ProcessRecoveryConflictInterrupts(), in combination of the one already
present for InterruptHoldoffCount. I agree that's a minor point,
though.

But QueryCancelPending can be set to true at any time by
StatementCancelHandler(), if we receive SIGINT.

andres@anarazel.de

over 3 years ago

In reply to: Thomas Munro (#11)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Hi,

On 2022-06-21 17:22:05 +1200, Thomas Munro wrote:

Problem: I saw 031_recovery_conflict.pl time out while waiting for a
buffer pin conflict, but so far once only, on CI:

https://cirrus-ci.com/task/5956804860444672

timed out waiting for match: (?^:User was holding shared buffer pin
for too long) at t/031_recovery_conflict.pl line 367.

Hrmph. Still trying to reproduce that, which may be a bug in this
patch, a bug in the test or a pre-existing problem. Note that
recovery didn't say something like:

2022-06-21 17:05:40.931 NZST [57674] LOG: recovery still waiting
after 11.197 ms: recovery conflict on buffer pin

(That's what I'd expect to see in
https://api.cirrus-ci.com/v1/artifact/task/5956804860444672/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log
if the startup process had decided to send the signal).

... so it seems like the problem in that run is upstream of the interrupt stuff.

Odd. The only theory I have so far is that the manual vacuum on the primary
somehow decided to skip the page, and thus didn't trigger a conflict. Because
clearly replay progressed past the records of the VACUUM. Perhaps we should
use VACUUM VERBOSE? In contrast to pg_regress tests that should be
unproblematic?

Greetings,

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Andres Freund (#16)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Hi,

On 2022-06-21 19:33:01 -0700, Andres Freund wrote:

On 2022-06-21 17:22:05 +1200, Thomas Munro wrote:

Problem: I saw 031_recovery_conflict.pl time out while waiting for a
buffer pin conflict, but so far once only, on CI:

https://cirrus-ci.com/task/5956804860444672

timed out waiting for match: (?^:User was holding shared buffer pin
for too long) at t/031_recovery_conflict.pl line 367.

Hrmph. Still trying to reproduce that, which may be a bug in this
patch, a bug in the test or a pre-existing problem. Note that
recovery didn't say something like:

2022-06-21 17:05:40.931 NZST [57674] LOG: recovery still waiting
after 11.197 ms: recovery conflict on buffer pin

(That's what I'd expect to see in
https://api.cirrus-ci.com/v1/artifact/task/5956804860444672/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log
if the startup process had decided to send the signal).

... so it seems like the problem in that run is upstream of the interrupt stuff.

Odd. The only theory I have so far is that the manual vacuum on the primary
somehow decided to skip the page, and thus didn't trigger a conflict. Because
clearly replay progressed past the records of the VACUUM. Perhaps we should
use VACUUM VERBOSE? In contrast to pg_regress tests that should be
unproblematic?

I saw a couple failures of 031 on CI for the meson patch - which obviously
doesn't change anything around this. However it adds a lot more distributions,
and the added ones run in docker containers on a shared host, presumably
adding a lot of noise. I saw this more frequently when I accidentally had the
test runs at the number of CPUs in the host, rather than the allotted CPUs in
the container.

That made me look more into these issues. I played with adding a pg_usleep()
to RecoveryConflictInterrupt() to simulate slow signal delivery.

Found a couple things:

- the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can
completely swamp the target(s) on a busy system. This surely is exascerbated
by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace
does seem like a bad idea.

- we process the same recovery conflict (not a single signal, but a single
"real conflict") multiple times in the target of a conflict, presumably
while handling the error. That includes handling the same interrupt once as
an ERROR and once as FATAL.

E.g.

2022-07-01 12:19:14.428 PDT [2000572] LOG: recovery still waiting after 10.032 ms: recovery conflict on buffer pin
2022-07-01 12:19:14.428 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 1663/16385/1>
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl ERROR: canceling statement due to conflict with recovery at character 15
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery.
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl STATEMENT: SELECT * FROM test_recovery_conflict_table2;
2022-07-01 12:19:54.778 PDT [2000572] LOG: recovery finished waiting after 40359.937 ms: recovery conflict on buffer pin
2022-07-01 12:19:54.778 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 1663/16385/1>
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery.
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-07-01 12:19:54.837 PDT [2001389] 031_recovery_conflict.pl LOG: statement: SELECT 1;

note that the startup process considers the conflict resolved by the time
the backend handles the interrupt.

I also see cases where a FATAL is repeated:

2022-07-01 12:43:18.190 PDT [2054721] LOG: recovery still waiting after 15.410 ms: recovery conflict on snapshot
2022-07-01 12:43:18.190 PDT [2054721] DETAIL: Conflicting process: 2054753.
2022-07-01 12:43:18.190 PDT [2054721] CONTEXT: WAL redo at 0/344AB90 for Heap2/PRUNE: latestRemovedXid 732 nredirected 18 ndead 0; blkref #0: rel 1663/16385/>
2054753: reporting recovery conflict 9
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versions that must be removed.
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
...
2054753: reporting recovery conflict 9
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versions that must be removed.
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.

the FATAL one seems like it might at least partially be due to
RecoveryConflictPending not being reset in at least some of the FATAL
recovery conflict paths.

It seems pretty obvious that the proc_exit_inprogress check in
RecoveryConflictInterrupt() is misplaced, and needs to be where the errors
are thrown. But that won't help, because it turns out, we don't yet set that
necessarily.

Look at this stack from an assertion in ProcessInterrupts() ensuring that
the same FATAL isn't raised twice:

#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1 0x00007fd47897b546 in __GI_abort () at abort.c:79
#2 0x00005594c150b27a in ExceptionalCondition (conditionName=0x5594c16fe746 "!in_fatal", errorType=0x5594c16fd8f6 "FailedAssertion",
fileName=0x5594c16fdac0 "/home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=3259)
at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69
#3 0x00005594c134f6d2 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3259
#4 0x00005594c150c671 in errfinish (filename=0x5594c16b8f2e "pqcomm.c", lineno=1393, funcname=0x5594c16b95e0 <__func__.8> "internal_flush")
at /home/andres/src/postgresql/src/backend/utils/error/elog.c:683
#5 0x00005594c115e059 in internal_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1393
#6 0x00005594c115df49 in socket_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1340
#7 0x00005594c15121af in send_message_to_frontend (edata=0x5594c18a5740 <errordata>) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:3283
#8 0x00005594c150f00e in EmitErrorReport () at /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541
#9 0x00005594c150c42e in errfinish (filename=0x5594c16fdaed "postgres.c", lineno=3266, funcname=0x5594c16ff5b0 <__func__.9> "ProcessInterrupts")
at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592
#10 0x00005594c134f770 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3266
#11 0x00005594c134b995 in ProcessClientReadInterrupt (blocked=true) at /home/andres/src/postgresql/src/backend/tcop/postgres.c:497
#12 0x00005594c1153417 in secure_read (port=0x5594c2e7d620, ptr=0x5594c189ba60 <PqRecvBuffer>, len=8192)

reporting a FATAL error in process of reporting a FATAL error. Yeha.

I assume this could lead to sending out the same message quite a few times.

This is quite the mess.

Greetings,

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Andres Freund (#17)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

HHi,

On 2022-07-01 13:14:23 -0700, Andres Freund wrote:

I saw a couple failures of 031 on CI for the meson patch - which obviously
doesn't change anything around this. However it adds a lot more distributions,
and the added ones run in docker containers on a shared host, presumably
adding a lot of noise. I saw this more frequently when I accidentally had the
test runs at the number of CPUs in the host, rather than the allotted CPUs in
the container.

That made me look more into these issues. I played with adding a pg_usleep()
to RecoveryConflictInterrupt() to simulate slow signal delivery.

Found a couple things:

- the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can
completely swamp the target(s) on a busy system. This surely is exascerbated
by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace
does seem like a bad idea.

This one is also implicated in
/messages/by-id/20220701154105.jjfutmngoedgiad3@alvherre.pgsql
and related issues.

Besides being very short, it also seems like a bad idea to wait when we might
not need to? Seems we should only wait if we subsequently couldn't get the
lock?

Misleadingly WaitExceedsMaxStandbyDelay() also contains a usleep, which at
least I wouldn't expect given its name.

A minimal fix would be to increase the wait time, similar how it is done with
standbyWait_us?

Medium term it seems we ought to set the startup process's latch when handling
a conflict, and use a latch wait. But avoiding races probably isn't quite
trivial.

- we process the same recovery conflict (not a single signal, but a single
"real conflict") multiple times in the target of a conflict, presumably
while handling the error. That includes handling the same interrupt once as
an ERROR and once as FATAL.

E.g.

2022-07-01 12:19:14.428 PDT [2000572] LOG: recovery still waiting after 10.032 ms: recovery conflict on buffer pin
2022-07-01 12:19:14.428 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 1663/16385/1>
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl ERROR: canceling statement due to conflict with recovery at character 15
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery.
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl STATEMENT: SELECT * FROM test_recovery_conflict_table2;
2022-07-01 12:19:54.778 PDT [2000572] LOG: recovery finished waiting after 40359.937 ms: recovery conflict on buffer pin
2022-07-01 12:19:54.778 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 1663/16385/1>
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery.
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-07-01 12:19:54.837 PDT [2001389] 031_recovery_conflict.pl LOG: statement: SELECT 1;

note that the startup process considers the conflict resolved by the time
the backend handles the interrupt.

I guess the reason we first get an ERROR and then a FATAL is that the second
iteration hits the if (RecoveryConflictPending && DoingCommandRead) bit,
because we end up there after handling the first error? And that's a FATAL.

I suspect that Thomas' fix will address at least part of this, as the check
whether we're still waiting for a lock will be made just before the error is
thrown.

I also see cases where a FATAL is repeated:

2022-07-01 12:43:18.190 PDT [2054721] LOG: recovery still waiting after 15.410 ms: recovery conflict on snapshot
2022-07-01 12:43:18.190 PDT [2054721] DETAIL: Conflicting process: 2054753.
2022-07-01 12:43:18.190 PDT [2054721] CONTEXT: WAL redo at 0/344AB90 for Heap2/PRUNE: latestRemovedXid 732 nredirected 18 ndead 0; blkref #0: rel 1663/16385/>
2054753: reporting recovery conflict 9
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versions that must be removed.
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
...
2054753: reporting recovery conflict 9
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versions that must be removed.
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.

the FATAL one seems like it might at least partially be due to
RecoveryConflictPending not being reset in at least some of the FATAL
recovery conflict paths.

It seems pretty obvious that the proc_exit_inprogress check in
RecoveryConflictInterrupt() is misplaced, and needs to be where the errors
are thrown. But that won't help, because it turns out, we don't yet set that
necessarily.

Look at this stack from an assertion in ProcessInterrupts() ensuring that
the same FATAL isn't raised twice:

#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1 0x00007fd47897b546 in __GI_abort () at abort.c:79
#2 0x00005594c150b27a in ExceptionalCondition (conditionName=0x5594c16fe746 "!in_fatal", errorType=0x5594c16fd8f6 "FailedAssertion",
fileName=0x5594c16fdac0 "/home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=3259)
at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69
#3 0x00005594c134f6d2 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3259
#4 0x00005594c150c671 in errfinish (filename=0x5594c16b8f2e "pqcomm.c", lineno=1393, funcname=0x5594c16b95e0 <__func__.8> "internal_flush")
at /home/andres/src/postgresql/src/backend/utils/error/elog.c:683
#5 0x00005594c115e059 in internal_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1393
#6 0x00005594c115df49 in socket_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1340
#7 0x00005594c15121af in send_message_to_frontend (edata=0x5594c18a5740 <errordata>) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:3283
#8 0x00005594c150f00e in EmitErrorReport () at /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541
#9 0x00005594c150c42e in errfinish (filename=0x5594c16fdaed "postgres.c", lineno=3266, funcname=0x5594c16ff5b0 <__func__.9> "ProcessInterrupts")
at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592
#10 0x00005594c134f770 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3266
#11 0x00005594c134b995 in ProcessClientReadInterrupt (blocked=true) at /home/andres/src/postgresql/src/backend/tcop/postgres.c:497
#12 0x00005594c1153417 in secure_read (port=0x5594c2e7d620, ptr=0x5594c189ba60 <PqRecvBuffer>, len=8192)

reporting a FATAL error in process of reporting a FATAL error. Yeha.

I assume this could lead to sending out the same message quite a few
times.

This seems like it needs to be fixed in elog.c. ISTM that at the very least we
ought to HOLD_INTERRUPTS() before the EmitErrorReport() for FATAL.

Greetings,

Andres Freund

thomas.munro@gmail.com

over 3 years ago

In reply to: Andres Freund (#18)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Sat, Jul 2, 2022 at 11:18 AM Andres Freund <andres@anarazel.de> wrote:

On 2022-07-01 13:14:23 -0700, Andres Freund wrote:

- the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can
completely swamp the target(s) on a busy system. This surely is exascerbated
by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace
does seem like a bad idea.

This one is also implicated in
/messages/by-id/20220701154105.jjfutmngoedgiad3@alvherre.pgsql
and related issues.

Besides being very short, it also seems like a bad idea to wait when we might
not need to? Seems we should only wait if we subsequently couldn't get the
lock?

Misleadingly WaitExceedsMaxStandbyDelay() also contains a usleep, which at
least I wouldn't expect given its name.

A minimal fix would be to increase the wait time, similar how it is done with
standbyWait_us?

Medium term it seems we ought to set the startup process's latch when handling
a conflict, and use a latch wait. But avoiding races probably isn't quite
trivial.

Yeah, I had the same thought; it's easy to criticise the current
collateral damage maximising design, but a whole project to come up
with a good race-free precise design. We should do that, though.

I guess the reason we first get an ERROR and then a FATAL is that the second
iteration hits the if (RecoveryConflictPending && DoingCommandRead) bit,
because we end up there after handling the first error? And that's a FATAL.

I suspect that Thomas' fix will address at least part of this, as the check
whether we're still waiting for a lock will be made just before the error is
thrown.

That seems right.

reporting a FATAL error in process of reporting a FATAL error. Yeha.

I assume this could lead to sending out the same message quite a few
times.

This seems like it needs to be fixed in elog.c. ISTM that at the very least we
ought to HOLD_INTERRUPTS() before the EmitErrorReport() for FATAL.

That seems to make sense.

About my patch... even though it solves a couple of problems now
identified, I found an architectural problem that I don't have a
solution for yet, which stopped me in my tracks a few weeks back. I
need to find a way forward that is back-patchable.

Recap: The basic concept here is to kick all "real work" out of
signal handlers, because that work is unsafe in that context. So
instead of deciding whether we need to cancel the current query at the
next CFI by setting QueryCancelPending, we defer the whole decision to
the next CFI. Sometimes the decision is that we don't need to do
anything, and the CFI returns and execution continues normally.

The problem is that there are a couple of parts of our tree that don't
use a standard CFI, but are interrupted by looking for
QueryCancelPending directly. syncrep.c is one, but I don't believe
you could be blocked there while recovery is in progress, and
regcomp.c is another. (There was a third case relating to that
posix_fallocate() problem report you mentioned above, but 4518c798
removed that). The regular expression machinery is capable of
consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re)
frequently to avoid getting stuck. With the patch as it stands, that
would never be true.

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Thomas Munro (#19)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Thomas Munro <thomas.munro@gmail.com> writes:

... The regular expression machinery is capable of
consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re)
frequently to avoid getting stuck. With the patch as it stands, that
would never be true.

Surely that can't be too hard to fix. We might have to refactor
the code around QueryCancelPending a little bit so that callers
can ask "do we need a query cancel now?" without actually triggering
a longjmp ... but why would that be problematic?

regards, tom lane

noah@leadboat.com

about 3 years ago

In reply to: Tom Lane (#20)

1 attachment(s)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Tue, Jul 26, 2022 at 07:22:52PM -0400, Tom Lane wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

... The regular expression machinery is capable of
consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re)
frequently to avoid getting stuck. With the patch as it stands, that
would never be true.

Surely that can't be too hard to fix. We might have to refactor
the code around QueryCancelPending a little bit so that callers
can ask "do we need a query cancel now?" without actually triggering
a longjmp ... but why would that be problematic?

It could work. The problems are like those of making code safe to run in a
signal handler. You can use e.g. snprintf in rcancelrequested(), but you
still can't use palloc() or ereport(). I see at least these strategies:

1. Accept that recovery conflict checks run after a regex call completes.
2. Have rcancelrequested() return true unconditionally if we need a conflict
check. If there's no actual conflict, restart the regex.
3. Have rcancelrequested() run the conflict check, including elog-using
PostgreSQL code. On longjmp(), accept the leak of regex mallocs.
4. Have rcancelrequested() run the conflict check, including elog-using
PostgreSQL code. On longjmp(), escalate to FATAL.
5. Write the conflict check code to dutifully avoid longjmp().
6. Convert src/backend/regex to use palloc, so longjmp() is fine.

I would tend to pick (3). (6) could come later and remove the drawback of
(3). Does one of those unblock the patch, or not?

===

I found this thread because $SUBJECT is causing more buildfarm failures
lately. Here are just the ones with symptom "timed out waiting for match:
(?^:User was holding a relation lock for too long)":

sysname │ snapshot │ branch │ bfurl
───────────┼─────────────────────┼───────────────┼────────────────────────────────────────────────────────────────────────────────────────────────
wrasse │ 2022-09-16 09:19:06 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-09-16%2009%3A19%3A06
francolin │ 2022-09-24 02:02:23 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin&dt=2022-09-24%2002%3A02%3A23
wrasse │ 2022-10-19 08:49:16 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-10-19%2008%3A49%3A16
wrasse │ 2022-11-16 16:59:23 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-11-16%2016%3A59%3A23
wrasse │ 2022-11-17 09:58:48 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-11-17%2009%3A58%3A48
wrasse │ 2022-11-21 22:17:20 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-11-21%2022%3A17%3A20
wrasse │ 2022-11-22 21:52:26 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-11-22%2021%3A52%3A26
wrasse │ 2022-11-25 09:16:44 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-11-25%2009%3A16%3A44
wrasse │ 2022-12-04 23:33:26 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-04%2023%3A33%3A26
wrasse │ 2022-12-07 11:48:54 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-07%2011%3A48%3A54
wrasse │ 2022-12-07 20:58:49 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-07%2020%3A58%3A49
wrasse │ 2022-12-09 12:19:40 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-09%2012%3A19%3A40
wrasse │ 2022-12-09 15:29:45 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-09%2015%3A29%3A45
wrasse │ 2022-12-15 09:29:52 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-15%2009%3A29%3A52
wrasse │ 2022-12-23 07:37:06 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-23%2007%3A37%3A06
wrasse │ 2022-12-23 10:32:05 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-23%2010%3A32%3A05
wrasse │ 2022-12-23 17:47:17 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-23%2017%3A47%3A17
(17 rows)

I can reproduce that symptom reliably, on GNU/Linux, with the attached patch
adding sleeps. The key log bit:

2022-09-16 11:50:37.338 CEST [15022:4] 031_recovery_conflict.pl LOG: statement: BEGIN;
2022-09-16 11:50:37.339 CEST [15022:5] 031_recovery_conflict.pl LOG: statement: LOCK TABLE test_recovery_conflict_table1 IN ACCESS SHARE MODE;
2022-09-16 11:50:37.341 CEST [15022:6] 031_recovery_conflict.pl LOG: statement: SELECT 1;
2022-09-16 11:50:38.076 CEST [14880:17] LOG: recovery still waiting after 11.482 ms: recovery conflict on lock
2022-09-16 11:50:38.076 CEST [14880:18] DETAIL: Conflicting process: 15022.
2022-09-16 11:50:38.076 CEST [14880:19] CONTEXT: WAL redo at 0/34243F0 for Standby/LOCK: xid 733 db 16385 rel 16386
2022-09-16 11:50:38.196 CEST [15022:7] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
2022-09-16 11:50:38.196 CEST [15022:8] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery.
2022-09-16 11:50:38.196 CEST [15022:9] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-09-16 11:50:38.197 CEST [15022:10] 031_recovery_conflict.pl LOG: disconnection: session time: 0:00:01.041 user=nm database=test_db host=[local]
2022-09-16 11:50:38.198 CEST [14880:20] LOG: recovery finished waiting after 132.886 ms: recovery conflict on lock

The second DETAIL should be "User was holding a relation lock for too long."
The backend in question is idle in transaction. RecoveryConflictInterrupt()
for PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK won't see IsWaitingForLock(),
so it will find no conflict. However, RecoveryConflictReason remains
clobbered, hence the wrong DETAIL message. Incidentally, the affected test
contains comment "# DROP TABLE containing block which standby has in a pinned
buffer". The standby holds no pin at that moment; the LOCK TABLE pins system
catalog pages, but it drops every pin it acquires.

Attachments:

repro-race-RecoveryConflictInterrupt-v0.patchtext/plain; charset=us-asciiDownload

Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    

diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7767657..a1c0d38 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -638,6 +638,8 @@ void
 procsignal_sigusr1_handler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
+	static bool slept = false;
+	bool		got;
 
 	if (CheckProcSignal(PROCSIG_CATCHUP_INTERRUPT))
 		HandleCatchupInterrupt();
@@ -663,13 +665,21 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_TABLESPACE))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
 
+	got = CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+	if (got && !slept)
+	{
+		slept = true;
+		/* probably must exceed max_standby_streaming_delay */
+		pg_usleep(60 * 1000);
+	}
+
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOCK))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
 
-	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
+	if (got)
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f43229d..204ae80 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -854,6 +854,7 @@ ResolveRecoveryConflictWithBufferPin(void)
 		 * is basically no so long. But we should fix this?
 		 */
 		SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+		pg_usleep(90 * 1000);
 	}
 
 	/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 01d264b..854e0db 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3003,6 +3003,9 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
 {
 	int			save_errno = errno;
 
+	/* Not elog(), which would process interrupts. */
+	fprintf(stderr, "reason = %d inprog=%d\n", reason, proc_exit_inprogress);
+
 	/*
 	 * Don't joggle the elbow of proc_exit
 	 */

thomas.munro@gmail.com

about 3 years ago

In reply to: Noah Misch (#21)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Thu, Dec 29, 2022 at 9:40 PM Noah Misch <noah@leadboat.com> wrote:

On Tue, Jul 26, 2022 at 07:22:52PM -0400, Tom Lane wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

... The regular expression machinery is capable of
consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re)
frequently to avoid getting stuck. With the patch as it stands, that
would never be true.

Surely that can't be too hard to fix. We might have to refactor
the code around QueryCancelPending a little bit so that callers
can ask "do we need a query cancel now?" without actually triggering
a longjmp ... but why would that be problematic?

It could work. The problems are like those of making code safe to run in a
signal handler. You can use e.g. snprintf in rcancelrequested(), but you
still can't use palloc() or ereport(). I see at least these strategies:

1. Accept that recovery conflict checks run after a regex call completes.
2. Have rcancelrequested() return true unconditionally if we need a conflict
check. If there's no actual conflict, restart the regex.
3. Have rcancelrequested() run the conflict check, including elog-using
PostgreSQL code. On longjmp(), accept the leak of regex mallocs.
4. Have rcancelrequested() run the conflict check, including elog-using
PostgreSQL code. On longjmp(), escalate to FATAL.
5. Write the conflict check code to dutifully avoid longjmp().
6. Convert src/backend/regex to use palloc, so longjmp() is fine.

Thanks! I appreciate the help getting unstuck here. I'd thought
about some of these but not all. I also considered a couple more:

7. Do a CFI() in a try/catch if INTERRUPTS_PENDING_CONDITION() is
true, and copy the error somewhere to be re-thrown later after the
regexp code exits with REG_CANCEL.
8. Do a CFI() in a try/catch if INTERRUPTS_PENDING_CONDITION() is
true, and call a new regexp function that will free everything before
re-throwing.

After Tom's response I spent some time trying to figure out how to
make a SOFT_CHECK_FOR_INTERRUPTS(), which would return a value to
indicate that it would like to throw. I think it would need to re-arm
various flags and introduce a programming rule for all interrupt
processing routines that if they fired once under a soft check they
must fire again later under a non-soft check. That all seems a bit
complicated, and a general mechanism like that seemed like overkill
for a single user, which led me to idea #7.

Idea #8 is a realisation that twisting oneself into a pretzel to avoid
having to change the regexp code or its REG_CANCEL control flow may be
a bit silly. If the only thing it really needs to do is free some
memory, maybe the regexp module should provide a function that frees
everything that is safe to call from our rcancelrequested callback, so
we can do so before we longjmp back to Kansas. Then the REG_CANCEL
code paths would be effectively unreachable in PostgreSQL. I don't
know if it's better or worse than your idea #6, "use palloc instead,
it already has garbage collection, duh", but it's a different take on
the same realisation that this is just about free().

I guess idea #6 must be pretty easy to try: just point that MALLOC()
macro to palloc(), and do a plain old CFI() in rcancelrequested().
Why do you suggest #3 as an interim measure? Do we fear that palloc()
might hurt regexp performance?

I can reproduce that symptom reliably, on GNU/Linux, with the attached patch
adding sleeps. The key log bit:

2022-09-16 11:50:37.338 CEST [15022:4] 031_recovery_conflict.pl LOG: statement: BEGIN;
2022-09-16 11:50:37.339 CEST [15022:5] 031_recovery_conflict.pl LOG: statement: LOCK TABLE test_recovery_conflict_table1 IN ACCESS SHARE MODE;
2022-09-16 11:50:37.341 CEST [15022:6] 031_recovery_conflict.pl LOG: statement: SELECT 1;
2022-09-16 11:50:38.076 CEST [14880:17] LOG: recovery still waiting after 11.482 ms: recovery conflict on lock
2022-09-16 11:50:38.076 CEST [14880:18] DETAIL: Conflicting process: 15022.
2022-09-16 11:50:38.076 CEST [14880:19] CONTEXT: WAL redo at 0/34243F0 for Standby/LOCK: xid 733 db 16385 rel 16386
2022-09-16 11:50:38.196 CEST [15022:7] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
2022-09-16 11:50:38.196 CEST [15022:8] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery.
2022-09-16 11:50:38.196 CEST [15022:9] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-09-16 11:50:38.197 CEST [15022:10] 031_recovery_conflict.pl LOG: disconnection: session time: 0:00:01.041 user=nm database=test_db host=[local]
2022-09-16 11:50:38.198 CEST [14880:20] LOG: recovery finished waiting after 132.886 ms: recovery conflict on lock

The second DETAIL should be "User was holding a relation lock for too long."
The backend in question is idle in transaction. RecoveryConflictInterrupt()
for PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK won't see IsWaitingForLock(),
so it will find no conflict. However, RecoveryConflictReason remains
clobbered, hence the wrong DETAIL message.

Aha. I'd speculated that RecoveryConflictReason must be capable of
reporting bogus errors like that up-thread.

Incidentally, the affected test
contains comment "# DROP TABLE containing block which standby has in a pinned
buffer". The standby holds no pin at that moment; the LOCK TABLE pins system
catalog pages, but it drops every pin it acquires.

Oh, I guess the comment is just wrong? There are earlier sections
concerned with buffer pins, but the section "RECOVERY CONFLICT 3" is
about locks.

noah@leadboat.com

about 3 years ago

In reply to: Thomas Munro (#22)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Sat, Dec 31, 2022 at 10:06:53AM +1300, Thomas Munro wrote:

On Thu, Dec 29, 2022 at 9:40 PM Noah Misch <noah@leadboat.com> wrote:

On Tue, Jul 26, 2022 at 07:22:52PM -0400, Tom Lane wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

... The regular expression machinery is capable of
consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re)
frequently to avoid getting stuck. With the patch as it stands, that
would never be true.

Surely that can't be too hard to fix. We might have to refactor
the code around QueryCancelPending a little bit so that callers
can ask "do we need a query cancel now?" without actually triggering
a longjmp ... but why would that be problematic?

It could work. The problems are like those of making code safe to run in a
signal handler. You can use e.g. snprintf in rcancelrequested(), but you
still can't use palloc() or ereport(). I see at least these strategies:

1. Accept that recovery conflict checks run after a regex call completes.
2. Have rcancelrequested() return true unconditionally if we need a conflict
check. If there's no actual conflict, restart the regex.
3. Have rcancelrequested() run the conflict check, including elog-using
PostgreSQL code. On longjmp(), accept the leak of regex mallocs.
4. Have rcancelrequested() run the conflict check, including elog-using
PostgreSQL code. On longjmp(), escalate to FATAL.
5. Write the conflict check code to dutifully avoid longjmp().
6. Convert src/backend/regex to use palloc, so longjmp() is fine.

Thanks! I appreciate the help getting unstuck here. I'd thought
about some of these but not all. I also considered a couple more:

7. Do a CFI() in a try/catch if INTERRUPTS_PENDING_CONDITION() is
true, and copy the error somewhere to be re-thrown later after the
regexp code exits with REG_CANCEL.
8. Do a CFI() in a try/catch if INTERRUPTS_PENDING_CONDITION() is
true, and call a new regexp function that will free everything before
re-throwing.

After Tom's response I spent some time trying to figure out how to
make a SOFT_CHECK_FOR_INTERRUPTS(), which would return a value to
indicate that it would like to throw. I think it would need to re-arm
various flags and introduce a programming rule for all interrupt
processing routines that if they fired once under a soft check they
must fire again later under a non-soft check. That all seems a bit
complicated, and a general mechanism like that seemed like overkill
for a single user, which led me to idea #7.

Idea #8 is a realisation that twisting oneself into a pretzel to avoid
having to change the regexp code or its REG_CANCEL control flow may be
a bit silly. If the only thing it really needs to do is free some
memory, maybe the regexp module should provide a function that frees
everything that is safe to call from our rcancelrequested callback, so
we can do so before we longjmp back to Kansas. Then the REG_CANCEL
code paths would be effectively unreachable in PostgreSQL. I don't
know if it's better or worse than your idea #6, "use palloc instead,
it already has garbage collection, duh", but it's a different take on
the same realisation that this is just about free().

PG_TRY() isn't free, so it's nice that (6) doesn't add one. If (6) fails in
some not-yet-revealed way, (8) could get more relevant.

I guess idea #6 must be pretty easy to try: just point that MALLOC()
macro to palloc(), and do a plain old CFI() in rcancelrequested().
Why do you suggest #3 as an interim measure?

No strong reason. I think I suggested it because it's a strict subset of (6),
but I didn't think through in detail. (I've never modified src/backend/regex
and have barely read its code, for whatever that's worth.)

Do we fear that palloc() might hurt regexp performance?

Nah. I don't recall any place in PostgreSQL where performance is an argument
for raw malloc() calls.

Incidentally, the affected test
contains comment "# DROP TABLE containing block which standby has in a pinned
buffer". The standby holds no pin at that moment; the LOCK TABLE pins system
catalog pages, but it drops every pin it acquires.

Oh, I guess the comment is just wrong? There are earlier sections
concerned with buffer pins, but the section "RECOVERY CONFLICT 3" is
about locks.

Yes.

thomas.munro@gmail.com

about 3 years ago

In reply to: Noah Misch (#23)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Sat, Dec 31, 2022 at 6:36 PM Noah Misch <noah@leadboat.com> wrote:

On Sat, Dec 31, 2022 at 10:06:53AM +1300, Thomas Munro wrote:

Idea #8 is a realisation that twisting oneself into a pretzel to avoid
having to change the regexp code or its REG_CANCEL control flow may be
a bit silly. If the only thing it really needs to do is free some
memory, maybe the regexp module should provide a function that frees
everything that is safe to call from our rcancelrequested callback, so
we can do so before we longjmp back to Kansas. Then the REG_CANCEL
code paths would be effectively unreachable in PostgreSQL. I don't
know if it's better or worse than your idea #6, "use palloc instead,
it already has garbage collection, duh", but it's a different take on
the same realisation that this is just about free().

PG_TRY() isn't free, so it's nice that (6) doesn't add one. If (6) fails in
some not-yet-revealed way, (8) could get more relevant.

I guess idea #6 must be pretty easy to try: just point that MALLOC()
macro to palloc(), and do a plain old CFI() in rcancelrequested().

It's not quite so easy: in RE_compile_and_cache we construct objects
with arbitrary cache-managed lifetime, which suggests we need a cache
memory context, but we could also fail mid construction, which
suggests we'd need a dedicated per-regex object memory context that is
made permanent with the MemoryContextSetParent() trick (as we see
elsewhere for cached things that are constructed by code that might
throw), or something like the try/catch thing from idea #8.
Thinking...

thomas.munro@gmail.com

about 3 years ago

In reply to: Thomas Munro (#24)

2 attachment(s)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Mon, Jan 2, 2023 at 8:38 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sat, Dec 31, 2022 at 6:36 PM Noah Misch <noah@leadboat.com> wrote:

On Sat, Dec 31, 2022 at 10:06:53AM +1300, Thomas Munro wrote:

Idea #8 is a realisation that twisting oneself into a pretzel to avoid
having to change the regexp code or its REG_CANCEL control flow may be
a bit silly. If the only thing it really needs to do is free some
memory, maybe the regexp module should provide a function that frees
everything that is safe to call from our rcancelrequested callback, so
we can do so before we longjmp back to Kansas. Then the REG_CANCEL
code paths would be effectively unreachable in PostgreSQL. I don't
know if it's better or worse than your idea #6, "use palloc instead,
it already has garbage collection, duh", but it's a different take on
the same realisation that this is just about free().

PG_TRY() isn't free, so it's nice that (6) doesn't add one. If (6) fails in
some not-yet-revealed way, (8) could get more relevant.

I guess idea #6 must be pretty easy to try: just point that MALLOC()
macro to palloc(), and do a plain old CFI() in rcancelrequested().

It's not quite so easy: in RE_compile_and_cache we construct objects
with arbitrary cache-managed lifetime, which suggests we need a cache
memory context, but we could also fail mid construction, which
suggests we'd need a dedicated per-regex object memory context that is
made permanent with the MemoryContextSetParent() trick (as we see
elsewhere for cached things that are constructed by code that might
throw), ...

Here's an experiment-grade attempt at idea #6 done that way, for
discussion. You can see how much memory is wasted by each regex_t,
which I guess is probably on the order of a couple of hundred KB if
you use all 32 regex cache slots using ALLOCSET_SMALL_SIZES as I did
here:

postgres=# select 'x' ~ 'hello world .*';
-[ RECORD 1 ]
?column? | f

postgres=# select * from pg_backend_memory_contexts where name =
'RegexpMemoryContext';
-[ RECORD 1 ]-+-------------------------
name | RegexpMemoryContext
ident | hello world .*
parent | RegexpCacheMemoryContext
level | 2
total_bytes | 13376
total_nblocks | 5
free_bytes | 5144
free_chunks | 8
used_bytes | 8232

There's some more memory allocated in regc_pg_locale.c with raw
malloc() that could probably benefit from a pallocisation just to be
able to measure it, but I didn't touch that here.

The recovery conflict patch itself is unchanged, except that I removed
the claim in the commit message that this would be back-patched. It's
pretty clear that this would need to spend a decent amount of time on
master only.

Attachments:

v4-0001-Use-MemoryContext-API-for-regexp-memory-managemen.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Use-MemoryContext-API-for-regexp-memory-managemen.patchDownload

From ee8eafb249368b889308fa23704f2a34a575b254 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 4 Jan 2023 14:15:40 +1300
Subject: [PATCH v4 1/2] Use MemoryContext API for regexp memory management.

Previously, regex_t objects' memory was managed using malloc() and
free() directly.  Since regexp compilation can take same time, regcomp()
would periodically call a callback function that would check for query
cancelation.  That design assumed that asynchronous query cancelation
could be detected by reading flags set by signal handlers, and that the
flags were a reliable indication that a later CHECK_FOR_INTERRUPTS()
would exit or throw an error.  This allowed the code to free memory
before aborting.

A later commit will refactor the recovery conflict interrupt system, to
move its logic out of signal handlers, because it is not
async-signal-safe (among other problems).  That would break the above
assumption, so we need another approach to memory cleanup.

Switch to using palloc(), the standard mechanism for garbage collection
in PostgreSQL.  Since regex_t objects have to survive across transaction
boundaries in a cache, introduce RegexpCacheMemoryContext.  Since
partial regex_t objects need to be cleaned up on failure to compile due
to interruption, also give each regex_t its own context.  It is
re-parented to the longer living RegexpCacheMemoryContext only if
compilation is successful, following a pattern seen elsewhere in the
tree.

XXX Experimental, may contain nuts
---
 src/backend/regex/regcomp.c    | 14 +++----
 src/backend/utils/adt/regexp.c | 70 ++++++++++++++++++++--------------
 src/include/regex/regcustom.h  |  6 +--
 3 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c
index bb8c240598..c0f8e77b49 100644
--- a/src/backend/regex/regcomp.c
+++ b/src/backend/regex/regcomp.c
@@ -2471,17 +2471,17 @@ rfree(regex_t *re)
 /*
  * rcancelrequested - check for external request to cancel regex operation
  *
- * Return nonzero to fail the operation with error code REG_CANCEL,
- * zero to keep going
- *
- * The current implementation is Postgres-specific.  If we ever get around
- * to splitting the regex code out as a standalone library, there will need
- * to be some API to let applications define a callback function for this.
+ * The current implementation always returns 0, if CHECK_FOR_INTERRUPTS()
+ * doesn't exit non-locally via ereport().  Memory allocated while compiling is
+ * expected to be cleaned up by virtue of being allocated using palloc in a
+ * suitable memory context.
  */
 static int
 rcancelrequested(void)
 {
-	return InterruptPending && (QueryCancelPending || ProcDiePending);
+	CHECK_FOR_INTERRUPTS();
+
+	return 0;
 }
 
 /*
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 810dcb85b6..0336bf70f3 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -96,9 +96,13 @@ typedef struct regexp_matches_ctx
 #define MAX_CACHED_RES	32
 #endif
 
+/* A parent memory context for regular expressions. */
+static MemoryContext RegexpCacheMemoryContext;
+
 /* this structure describes one cached regular expression */
 typedef struct cached_re_str
 {
+	MemoryContext cre_context;	/* memory context for this regexp */
 	char	   *cre_pat;		/* original RE (not null terminated!) */
 	int			cre_pat_len;	/* length of original RE, in bytes */
 	int			cre_flags;		/* compile flags: extended,icase etc */
@@ -145,6 +149,7 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 	int			regcomp_result;
 	cached_re_str re_temp;
 	char		errMsg[100];
+	MemoryContext oldcontext;
 
 	/*
 	 * Look for a match among previously compiled REs.  Since the data
@@ -172,6 +177,13 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 		}
 	}
 
+	/* Set up the cache memory on first go through. */
+	if (unlikely(RegexpCacheMemoryContext == NULL))
+		RegexpCacheMemoryContext =
+			AllocSetContextCreate(TopMemoryContext,
+								  "RegexpCacheMemoryContext",
+								  ALLOCSET_SMALL_SIZES);
+
 	/*
 	 * Couldn't find it, so try to compile the new RE.  To avoid leaking
 	 * resources on failure, we build into the re_temp local.
@@ -183,6 +195,18 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 									   pattern,
 									   text_re_len);
 
+	/*
+	 * Make a memory context for this compiled regexp.  This is initially a
+	 * child of the current memory context, so it will be cleaned up
+	 * automatically if compilation is interrupted and throws an ERROR.
+	 * We'll re-parent it under the longer lived cache context if we make it
+	 * to the bottom of this function.
+	 */
+	re_temp.cre_context = AllocSetContextCreate(CurrentMemoryContext,
+												"RegexpMemoryContext",
+												ALLOCSET_SMALL_SIZES);
+	oldcontext = MemoryContextSwitchTo(re_temp.cre_context);
+
 	regcomp_result = pg_regcomp(&re_temp.cre_re,
 								pattern,
 								pattern_len,
@@ -193,37 +217,24 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 
 	if (regcomp_result != REG_OKAY)
 	{
-		/* re didn't compile (no need for pg_regfree, if so) */
-
-		/*
-		 * Here and in other places in this file, do CHECK_FOR_INTERRUPTS
-		 * before reporting a regex error.  This is so that if the regex
-		 * library aborts and returns REG_CANCEL, we don't print an error
-		 * message that implies the regex was invalid.
-		 */
-		CHECK_FOR_INTERRUPTS();
-
+		/* re didn't compile */
 		pg_regerror(regcomp_result, &re_temp.cre_re, errMsg, sizeof(errMsg));
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
 				 errmsg("invalid regular expression: %s", errMsg)));
 	}
 
+	/* Copy the pattern into the per-regexp memory context. */
+	re_temp.cre_pat = palloc(text_re_len + 1);
+	memcpy(re_temp.cre_pat, text_re_val, text_re_len);
+
 	/*
-	 * We use malloc/free for the cre_pat field because the storage has to
-	 * persist across transactions, and because we want to get control back on
-	 * out-of-memory.  The Max() is because some malloc implementations return
-	 * NULL for malloc(0).
+	 * NUL-terminate it only for the benefit of the identifier used for the
+	 * memory context, visible int he pg_backend_memory_contexts view.
 	 */
-	re_temp.cre_pat = malloc(Max(text_re_len, 1));
-	if (re_temp.cre_pat == NULL)
-	{
-		pg_regfree(&re_temp.cre_re);
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory")));
-	}
-	memcpy(re_temp.cre_pat, text_re_val, text_re_len);
+	re_temp.cre_pat[text_re_len] = 0;
+	MemoryContextSetIdentifier(re_temp.cre_context, re_temp.cre_pat);
+
 	re_temp.cre_pat_len = text_re_len;
 	re_temp.cre_flags = cflags;
 	re_temp.cre_collation = collation;
@@ -236,16 +247,21 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 	{
 		--num_res;
 		Assert(num_res < MAX_CACHED_RES);
-		pg_regfree(&re_array[num_res].cre_re);
-		free(re_array[num_res].cre_pat);
+		/* Delete the memory context holding the regexp and pattern. */
+		MemoryContextDelete(re_array[num_res].cre_context);
 	}
 
+	/* Re-parent the memory context to our long-lived cache context. */
+	MemoryContextSetParent(re_temp.cre_context, RegexpCacheMemoryContext);
+
 	if (num_res > 0)
 		memmove(&re_array[1], &re_array[0], num_res * sizeof(cached_re_str));
 
 	re_array[0] = re_temp;
 	num_res++;
 
+	MemoryContextSwitchTo(oldcontext);
+
 	return &re_array[0].cre_re;
 }
 
@@ -283,7 +299,6 @@ RE_wchar_execute(regex_t *re, pg_wchar *data, int data_len,
 	if (regexec_result != REG_OKAY && regexec_result != REG_NOMATCH)
 	{
 		/* re failed??? */
-		CHECK_FOR_INTERRUPTS();
 		pg_regerror(regexec_result, re, errMsg, sizeof(errMsg));
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
@@ -1976,7 +1991,6 @@ regexp_fixed_prefix(text *text_re, bool case_insensitive, Oid collation,
 
 		default:
 			/* re failed??? */
-			CHECK_FOR_INTERRUPTS();
 			pg_regerror(re_result, re, errMsg, sizeof(errMsg));
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
@@ -1990,7 +2004,7 @@ regexp_fixed_prefix(text *text_re, bool case_insensitive, Oid collation,
 	slen = pg_wchar2mb_with_len(str, result, slen);
 	Assert(slen < maxlen);
 
-	free(str);
+	pfree(str);
 
 	return result;
 }
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h
index fc158e1bb7..8f4025128e 100644
--- a/src/include/regex/regcustom.h
+++ b/src/include/regex/regcustom.h
@@ -49,9 +49,9 @@
 
 /* overrides for regguts.h definitions, if any */
 #define FUNCPTR(name, args) (*name) args
-#define MALLOC(n)		malloc(n)
-#define FREE(p)			free(VS(p))
-#define REALLOC(p,n)	realloc(VS(p),n)
+#define MALLOC(n)		palloc_extended((n), MCXT_ALLOC_NO_OOM)
+#define FREE(p)			pfree(VS(p))
+#define REALLOC(p,n)	repalloc_extended(VS(p),(n), MCXT_ALLOC_NO_OOM)
 #define assert(x)		Assert(x)
 
 /* internal character type and related */
-- 
2.35.1

v4-0002-Fix-recovery-conflict-SIGUSR1-handling.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Fix-recovery-conflict-SIGUSR1-handling.patchDownload

From dcae4fed08e34b43d450ebbd39aa1590acfc6693 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 10 May 2022 16:00:23 +1200
Subject: [PATCH v4 2/2] Fix recovery conflict SIGUSR1 handling.

We shouldn't be doing real work in a signal handler, to avoid reaching
code that is not safe in that context.  Move all recovery conflict
checking logic into the next CFI following the standard pattern.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/storage/buffer/bufmgr.c  |   4 +-
 src/backend/storage/ipc/procsignal.c |  12 +-
 src/backend/tcop/postgres.c          | 312 ++++++++++++++-------------
 src/include/storage/procsignal.h     |   4 +-
 src/include/tcop/tcopprot.h          |   3 +-
 5 files changed, 172 insertions(+), 163 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 3fb38a25cf..378f88ce11 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4373,8 +4373,8 @@ LockBufferForCleanup(Buffer buffer)
 }
 
 /*
- * Check called from RecoveryConflictInterrupt handler when Startup
- * process requests cancellation of all pin holders that are blocking it.
+ * Check called from ProcessRecoveryConflictInterrupts() when Startup process
+ * requests cancellation of all pin holders that are blocking it.
  */
 bool
 HoldingBufferPinThatDelaysRecovery(void)
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 43fbbdbc86..17ed71f790 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -658,22 +658,22 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 		HandleLogMemoryContextInterrupt();
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_TABLESPACE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
 	SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 31479e8212..24930f93ba 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -158,9 +158,8 @@ static bool EchoQuery = false;	/* -E switch */
 static bool UseSemiNewlineNewline = false;	/* -j switch */
 
 /* whether or not, and why, we were canceled by conflict with recovery */
-static bool RecoveryConflictPending = false;
-static bool RecoveryConflictRetryable = true;
-static ProcSignalReason RecoveryConflictReason;
+static volatile sig_atomic_t RecoveryConflictPending = false;
+static volatile sig_atomic_t RecoveryConflictPendingReasons[NUM_PROCSIGNALS];
 
 /* reused buffer to pass to SendRowDescriptionMessage() */
 static MemoryContext row_description_context = NULL;
@@ -179,7 +178,6 @@ static bool check_log_statement(List *stmt_list);
 static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
-static int	errdetail_recovery_conflict(void);
 static void bind_param_error_callback(void *arg);
 static void start_xact_command(void);
 static void finish_xact_command(void);
@@ -2466,9 +2464,9 @@ errdetail_abort(void)
  * Add an errdetail() line showing conflict source.
  */
 static int
-errdetail_recovery_conflict(void)
+errdetail_recovery_conflict(ProcSignalReason reason)
 {
-	switch (RecoveryConflictReason)
+	switch (reason)
 	{
 		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 			errdetail("User was holding shared buffer pin for too long.");
@@ -2993,137 +2991,190 @@ FloatExceptionHandler(SIGNAL_ARGS)
 }
 
 /*
- * RecoveryConflictInterrupt: out-of-line portion of recovery conflict
- * handling following receipt of SIGUSR1. Designed to be similar to die()
- * and StatementCancelHandler(). Called only by a normal user backend
- * that begins a transaction during recovery.
+ * Tell the next CHECK_FOR_INTERRUPTS() to check for a particular type of
+ * recovery conflict.  Runs in a SIGUSR1 handler.
  */
 void
-RecoveryConflictInterrupt(ProcSignalReason reason)
+HandleRecoveryConflictInterrupt(ProcSignalReason reason)
 {
-	int			save_errno = errno;
+	RecoveryConflictPendingReasons[reason] = true;
+	RecoveryConflictPending = true;
+	InterruptPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
 
-	/*
-	 * Don't joggle the elbow of proc_exit
-	 */
-	if (!proc_exit_inprogress)
+/*
+ * Check one individual conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupt(ProcSignalReason reason)
+{
+	switch (reason)
 	{
-		RecoveryConflictReason = reason;
-		switch (reason)
-		{
-			case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
+		case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
 
-				/*
-				 * If we aren't waiting for a lock we can never deadlock.
-				 */
-				if (!IsWaitingForLock())
-					return;
+			/*
+			 * If we aren't waiting for a lock we can never deadlock.
+			 */
+			if (!IsWaitingForLock())
+				return;
 
-				/* Intentional fall through to check wait for pin */
-				/* FALLTHROUGH */
+			/* Intentional fall through to check wait for pin */
+			/* FALLTHROUGH */
 
-			case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
+		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 
-				/*
-				 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
-				 * aren't blocking the Startup process there is nothing more
-				 * to do.
-				 *
-				 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is
-				 * requested, if we're waiting for locks and the startup
-				 * process is not waiting for buffer pin (i.e., also waiting
-				 * for locks), we set the flag so that ProcSleep() will check
-				 * for deadlocks.
-				 */
-				if (!HoldingBufferPinThatDelaysRecovery())
-				{
-					if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
-						GetStartupBufferPinWaitBufId() < 0)
-						CheckDeadLockAlert();
-					return;
-				}
+			/*
+			 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
+			 * aren't blocking the Startup process there is nothing more to
+			 * do.
+			 *
+			 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is requested,
+			 * if we're waiting for locks and the startup process is not
+			 * waiting for buffer pin (i.e., also waiting for locks), we set
+			 * the flag so that ProcSleep() will check for deadlocks.
+			 */
+			if (!HoldingBufferPinThatDelaysRecovery())
+			{
+				if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
+					GetStartupBufferPinWaitBufId() < 0)
+					CheckDeadLockAlert();
+				return;
+			}
 
-				MyProc->recoveryConflictPending = true;
+			MyProc->recoveryConflictPending = true;
 
-				/* Intentional fall through to error handling */
-				/* FALLTHROUGH */
+			/* Intentional fall through to error handling */
+			/* FALLTHROUGH */
+
+		case PROCSIG_RECOVERY_CONFLICT_LOCK:
+		case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
+		case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
 
-			case PROCSIG_RECOVERY_CONFLICT_LOCK:
-			case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
-			case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+			/*
+			 * If we aren't in a transaction any longer then ignore.
+			 */
+			if (!IsTransactionOrTransactionBlock())
+				return;
 
+			/*
+			 * If we're not in a subtransaction then we are OK to throw an
+			 * ERROR to resolve the conflict.  Otherwise drop through to the
+			 * FATAL case.
+			 *
+			 * XXX other times that we can throw just an ERROR *may* be
+			 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in parent
+			 * transactions
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held by
+			 * parent transactions and the transaction is not
+			 * transaction-snapshot mode
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
+			 * cursors open in parent transactions
+			 */
+			if (!IsSubTransaction())
+			{
 				/*
-				 * If we aren't in a transaction any longer then ignore.
+				 * If we already aborted then we no longer need to cancel.  We
+				 * do this here since we do not wish to ignore aborted
+				 * subtransactions, which must cause FATAL, currently.
 				 */
-				if (!IsTransactionOrTransactionBlock())
+				if (IsAbortedTransactionBlockState())
 					return;
 
 				/*
-				 * If we can abort just the current subtransaction then we are
-				 * OK to throw an ERROR to resolve the conflict. Otherwise
-				 * drop through to the FATAL case.
-				 *
-				 * XXX other times that we can throw just an ERROR *may* be
-				 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in
-				 * parent transactions
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held
-				 * by parent transactions and the transaction is not
-				 * transaction-snapshot mode
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
-				 * cursors open in parent transactions
+				 * If a recovery conflict happens while we are waiting for
+				 * input from the client, the client is presumably just
+				 * sitting idle in a transaction, preventing recovery from
+				 * making progress.  We'll drop through to the FATAL case
+				 * below to dislodge it, in that case.
 				 */
-				if (!IsSubTransaction())
+				if (!DoingCommandRead)
 				{
-					/*
-					 * If we already aborted then we no longer need to cancel.
-					 * We do this here since we do not wish to ignore aborted
-					 * subtransactions, which must cause FATAL, currently.
-					 */
-					if (IsAbortedTransactionBlockState())
+					/* Avoid losing sync in the FE/BE protocol. */
+					if (QueryCancelHoldoffCount != 0)
+					{
+						/*
+						 * Re-arm and defer this interrupt until later.  See
+						 * similar code in ProcessInterrupts().
+						 */
+						RecoveryConflictPendingReasons[reason] = true;
+						RecoveryConflictPending = true;
+						InterruptPending = true;
 						return;
+					}
 
-					RecoveryConflictPending = true;
-					QueryCancelPending = true;
-					InterruptPending = true;
+					/*
+					 * We are cleared to throw an ERROR.  We have a top-level
+					 * transaction that we can abort and a conflict that isn't
+					 * inherently non-retryable.
+					 */
+					LockErrorCleanup();
+					pgstat_report_recovery_conflict(reason);
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("canceling statement due to conflict with recovery"),
+							 errdetail_recovery_conflict(reason)));
 					break;
 				}
+			}
 
-				/* Intentional fall through to session cancel */
-				/* FALLTHROUGH */
-
-			case PROCSIG_RECOVERY_CONFLICT_DATABASE:
-				RecoveryConflictPending = true;
-				ProcDiePending = true;
-				InterruptPending = true;
-				break;
+			/* Intentional fall through to session cancel */
+			/* FALLTHROUGH */
 
-			default:
-				elog(FATAL, "unrecognized conflict mode: %d",
-					 (int) reason);
-		}
+		case PROCSIG_RECOVERY_CONFLICT_DATABASE:
 
-		Assert(RecoveryConflictPending && (QueryCancelPending || ProcDiePending));
+			/*
+			 * Retrying is not possible because the database is dropped, or we
+			 * decided above that we couldn't resolve the conflict with an
+			 * ERROR and fell through.  Terminate the session.
+			 */
+			pgstat_report_recovery_conflict(reason);
+			ereport(FATAL,
+					(errcode(reason == PROCSIG_RECOVERY_CONFLICT_DATABASE ?
+							 ERRCODE_DATABASE_DROPPED :
+							 ERRCODE_T_R_SERIALIZATION_FAILURE),
+					 errmsg("terminating connection due to conflict with recovery"),
+					 errdetail_recovery_conflict(reason),
+					 errhint("In a moment you should be able to reconnect to the"
+							 " database and repeat your command.")));
+			break;
 
-		/*
-		 * All conflicts apart from database cause dynamic errors where the
-		 * command or transaction can be retried at a later point with some
-		 * potential for success. No need to reset this, since non-retryable
-		 * conflict errors are currently FATAL.
-		 */
-		if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
-			RecoveryConflictRetryable = false;
+		default:
+			elog(FATAL, "unrecognized conflict mode: %d", (int) reason);
 	}
+}
+
+/*
+ * Check each possible recovery conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupts(void)
+{
+	ProcSignalReason reason;
 
 	/*
-	 * Set the process latch. This function essentially emulates signal
-	 * handlers like die() and StatementCancelHandler() and it seems prudent
-	 * to behave similarly as they do.
+	 * We don't need to worry about joggling the elbow of proc_exit, because
+	 * proc_exit_prepare() holds interrupts, so ProcessInterrupts() won't call
+	 * us.
 	 */
-	SetLatch(MyLatch);
+	Assert(!proc_exit_inprogress);
+	Assert(InterruptHoldoffCount == 0);
+	Assert(RecoveryConflictPending);
 
-	errno = save_errno;
+	RecoveryConflictPending = false;
+
+	for (reason = PROCSIG_RECOVERY_CONFLICT_FIRST;
+		 reason <= PROCSIG_RECOVERY_CONFLICT_LAST;
+		 reason++)
+	{
+		if (RecoveryConflictPendingReasons[reason])
+		{
+			RecoveryConflictPendingReasons[reason] = false;
+			ProcessRecoveryConflictInterrupt(reason);
+		}
+	}
 }
 
 /*
@@ -3178,24 +3229,6 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
-		else if (RecoveryConflictPending && RecoveryConflictRetryable)
-		{
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
-		else if (RecoveryConflictPending)
-		{
-			/* Currently there is only one non-retryable recovery conflict */
-			Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_DATABASE_DROPPED),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 		else if (IsBackgroundWorker)
 			ereport(FATAL,
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
@@ -3238,31 +3271,13 @@ ProcessInterrupts(void)
 				 errmsg("connection to client lost")));
 	}
 
-	/*
-	 * If a recovery conflict happens while we are waiting for input from the
-	 * client, the client is presumably just sitting idle in a transaction,
-	 * preventing recovery from making progress.  Terminate the connection to
-	 * dislodge it.
-	 */
-	if (RecoveryConflictPending && DoingCommandRead)
-	{
-		QueryCancelPending = false; /* this trumps QueryCancel */
-		RecoveryConflictPending = false;
-		LockErrorCleanup();
-		pgstat_report_recovery_conflict(RecoveryConflictReason);
-		ereport(FATAL,
-				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-				 errmsg("terminating connection due to conflict with recovery"),
-				 errdetail_recovery_conflict(),
-				 errhint("In a moment you should be able to reconnect to the"
-						 " database and repeat your command.")));
-	}
-
 	/*
 	 * Don't allow query cancel interrupts while reading input from the
 	 * client, because we might lose sync in the FE/BE protocol.  (Die
 	 * interrupts are OK, because we won't read any further messages from the
 	 * client in that case.)
+	 *
+	 * See similar logic in ProcessRecoveryConflictInterrupts().
 	 */
 	if (QueryCancelPending && QueryCancelHoldoffCount != 0)
 	{
@@ -3321,16 +3336,6 @@ ProcessInterrupts(void)
 					(errcode(ERRCODE_QUERY_CANCELED),
 					 errmsg("canceling autovacuum task")));
 		}
-		if (RecoveryConflictPending)
-		{
-			RecoveryConflictPending = false;
-			LockErrorCleanup();
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(ERROR,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("canceling statement due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 
 		/*
 		 * If we are reading a command from the client, just ignore the cancel
@@ -3346,6 +3351,9 @@ ProcessInterrupts(void)
 		}
 	}
 
+	if (RecoveryConflictPending)
+		ProcessRecoveryConflictInterrupts();
+
 	if (IdleInTransactionSessionTimeoutPending)
 	{
 		/*
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 45e2f1fe9d..362301ccf2 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -37,12 +37,14 @@ typedef enum
 	PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
 
 	/* Recovery conflict reasons */
-	PROCSIG_RECOVERY_CONFLICT_DATABASE,
+	PROCSIG_RECOVERY_CONFLICT_FIRST,
+	PROCSIG_RECOVERY_CONFLICT_DATABASE = PROCSIG_RECOVERY_CONFLICT_FIRST,
 	PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
 	PROCSIG_RECOVERY_CONFLICT_LOCK,
 	PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+	PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index abd7b4fff3..ab43b638ee 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -70,8 +70,7 @@ extern void die(SIGNAL_ARGS);
 extern void quickdie(SIGNAL_ARGS) pg_attribute_noreturn();
 extern void StatementCancelHandler(SIGNAL_ARGS);
 extern void FloatExceptionHandler(SIGNAL_ARGS) pg_attribute_noreturn();
-extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
-																 * handler */
+extern void HandleRecoveryConflictInterrupt(ProcSignalReason reason);
 extern void ProcessClientReadInterrupt(bool blocked);
 extern void ProcessClientWriteInterrupt(bool blocked);
 
-- 
2.35.1

andres@anarazel.de

about 3 years ago

In reply to: Noah Misch (#21)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Hi,

On 2022-12-29 00:40:52 -0800, Noah Misch wrote:

Incidentally, the affected test contains comment "# DROP TABLE containing
block which standby has in a pinned buffer". The standby holds no pin at
that moment; the LOCK TABLE pins system catalog pages, but it drops every
pin it acquires.

I guess that comment survived from an earlier version of that test (or another
test where it was copied from).

I'm inclined to just delete it.

Greetings,

Andres Freund

andres@anarazel.de

about 3 years ago

In reply to: Thomas Munro (#25)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Hi,

On 2023-01-04 16:46:05 +1300, Thomas Munro wrote:

postgres=# select 'x' ~ 'hello world .*';
-[ RECORD 1 ]
?column? | f

postgres=# select * from pg_backend_memory_contexts where name =
'RegexpMemoryContext';
-[ RECORD 1 ]-+-------------------------
name | RegexpMemoryContext
ident | hello world .*
parent | RegexpCacheMemoryContext
level | 2
total_bytes | 13376
total_nblocks | 5

Hm, if a trivial re uses 13kB, using ALLOCSET_SMALL_SIZES might actually
increase memory usage by increasing the number of blocks.

free_bytes | 5144
free_chunks | 8
used_bytes | 8232

Hm. So we actually have a bunch of temporary allocations in here. I assume
that's all the stuff from the "non-compact" representation that
src/backend/regex/README talks about?

I doesn't immedialy look trivial to use a separate memory context for the
"final" representation and scratch memory though.

There's some more memory allocated in regc_pg_locale.c with raw
malloc() that could probably benefit from a pallocisation just to be
able to measure it, but I didn't touch that here.

It might also effectively reduce the overhead of using palloc, by filling the
context up further.

diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c
index bb8c240598..c0f8e77b49 100644
--- a/src/backend/regex/regcomp.c
+++ b/src/backend/regex/regcomp.c
@@ -2471,17 +2471,17 @@ rfree(regex_t *re)
/*
* rcancelrequested - check for external request to cancel regex operation
*
- * Return nonzero to fail the operation with error code REG_CANCEL,
- * zero to keep going
- *
- * The current implementation is Postgres-specific.  If we ever get around
- * to splitting the regex code out as a standalone library, there will need
- * to be some API to let applications define a callback function for this.
+ * The current implementation always returns 0, if CHECK_FOR_INTERRUPTS()
+ * doesn't exit non-locally via ereport().  Memory allocated while compiling is
+ * expected to be cleaned up by virtue of being allocated using palloc in a
+ * suitable memory context.
*/
static int
rcancelrequested(void)
{
-	return InterruptPending && (QueryCancelPending || ProcDiePending);
+	CHECK_FOR_INTERRUPTS();
+
+	return 0;
}

Hm. Seems confusing for this to continue being called rcancelrequested() and
to be called via if(CANCEL_REQUESTED()), if we're not even documenting that
it's intended to be usable that way?

Seems at the minimum we ought to keep more of the old comment, to explain the
somewhat odd API?

+	/* Set up the cache memory on first go through. */
+	if (unlikely(RegexpCacheMemoryContext == NULL))
+		RegexpCacheMemoryContext =
+			AllocSetContextCreate(TopMemoryContext,
+								  "RegexpCacheMemoryContext",
+								  ALLOCSET_SMALL_SIZES);

I think it might be nicer to create this below CacheMemoryContext? Just so the
"memory context tree" stays nicely ordered.

Greetings,

Andres Freund

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Andres Freund (#27)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Andres Freund <andres@anarazel.de> writes:

Hm. Seems confusing for this to continue being called rcancelrequested() and
to be called via if(CANCEL_REQUESTED()), if we're not even documenting that
it's intended to be usable that way?

Yeah. I'm not very happy with this line of development at all,
because I think we are painting ourselves into a corner by not allowing
code to detect whether a cancel is pending without having it happen
immediately. (That is, I do not believe that backend/regex/ is the
only code that will ever wish for that.) But if that is the direction
we're going to go in, we should probably revise these APIs to make them
less odd. I'm not sure why we'd keep the REG_CANCEL error code at all.

I think it might be nicer to create this below CacheMemoryContext?

Meh ... CacheMemoryContext might not exist yet, especially for the
use-cases in the login logic.

regards, tom lane

andres@anarazel.de

about 3 years ago

In reply to: Tom Lane (#28)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Hi,

On 2023-01-04 17:55:43 -0500, Tom Lane wrote:

I'm not very happy with this line of development at all, because I think we
are painting ourselves into a corner by not allowing code to detect whether
a cancel is pending without having it happen immediately. (That is, I do
not believe that backend/regex/ is the only code that will ever wish for
that.)

I first wrote that this is hard to make work without introducing overhead
(like a PG_TRY in rcancelrequested()), for a bunch of reasons discussed
upthread (see [1]/messages/by-id/CA+hUKG+qtNxDQAzC20AnUxuigKYb=7shtmsuSyMekjni=ik6BA@mail.gmail.com).

But now I wonder if we didn't recently introduce most of the framework to make
this less hard / expensive.

What about using a version of errsave() that can save FATALs too? We could
have something roughly like the ProcessInterrupts() in the proposed patch that
is used from within rcancelrequested(). But instead of actually throwing the
error, we'd just remember the to-be-thrown-later error, that the next
"real" CFI would throw.

That still leaves us with some increased likelihood of erroring out within the
regex machinery, e.g. if there's an out-of-memory error within elog.c
processing. But I'd not be too worried about leaking memory in that corner
case. Which also could be closed using the approach in Thomas' patch, except
that it normally would still return in rcancelrequested().

Insane?

Greetings,

Andres Freund

[1]: /messages/by-id/CA+hUKG+qtNxDQAzC20AnUxuigKYb=7shtmsuSyMekjni=ik6BA@mail.gmail.com

thomas.munro@gmail.com

about 3 years ago

In reply to: Andres Freund (#29)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Thu, Jan 5, 2023 at 12:33 PM Andres Freund <andres@anarazel.de> wrote:

What about using a version of errsave() that can save FATALs too? We could
have something roughly like the ProcessInterrupts() in the proposed patch that
is used from within rcancelrequested(). But instead of actually throwing the
error, we'd just remember the to-be-thrown-later error, that the next
"real" CFI would throw.

Right, I contemplated variations on that theme. I'd be willing to
code something like that to kick the tyres, but it seems like it would
make back-patching more painful? We're trying to fix bugs here...
Deciding to proceed with #6 (palloc) wouldn't mean we can't eventually
also implement two phase/soft CFI() when we have a potential user, so
I don't really get the painted-into-a-corner argument. However, it's
all moot if the #6 isn't good enough on its own merits independent of
other hypothetical future users (eg if the per regex_t MemoryContext
overheads are considered too high and can't be tuned acceptably).

andres@anarazel.de

about 3 years ago

In reply to: Thomas Munro (#30)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Hi,

On 2023-01-05 13:21:54 +1300, Thomas Munro wrote:

Right, I contemplated variations on that theme. I'd be willing to
code something like that to kick the tyres, but it seems like it would
make back-patching more painful? We're trying to fix bugs here...

I think we need to accept that this mess can't be fixed in the back
branches. I'd rather get a decent fix sometime in PG16 than a crufty fix in PG
17 that we then backpatch a while later.

Deciding to proceed with #6 (palloc) wouldn't mean we can't eventually
also implement two phase/soft CFI() when we have a potential user, so
I don't really get the painted-into-a-corner argument.

I think that's a fair point.

However, it's all moot if the #6 isn't good enough on its own merits
independent of other hypothetical future users (eg if the per regex_t
MemoryContext overheads are considered too high and can't be tuned
acceptably).

I'm not too worried about that, particularly because it looks like it'd not be
too hard to lower the overhead further. Arguably allocating memory outside of
mcxt.c is actually a bad thing independent of error handing, because it's
effectively "invisible" to our memory-usage-monitoring facilities.

Greetings,

Andres Freund

thomas.munro@gmail.com

about 3 years ago

In reply to: Tom Lane (#28)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Thu, Jan 5, 2023 at 11:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

Hm. Seems confusing for this to continue being called rcancelrequested() and
to be called via if(CANCEL_REQUESTED()), if we're not even documenting that
it's intended to be usable that way?

Yeah. I'm not very happy with this line of development at all,
because I think we are painting ourselves into a corner by not allowing
code to detect whether a cancel is pending without having it happen
immediately. (That is, I do not believe that backend/regex/ is the
only code that will ever wish for that.) But if that is the direction
we're going to go in, we should probably revise these APIs to make them
less odd. I'm not sure why we'd keep the REG_CANCEL error code at all.

Ah, OK. I had the impression from the way the code is laid out with a
wall between "PostgreSQL" bits and "vendored library" bits that we
might have some reason to want to keep that callback interface the
same (ie someone else is using this code and we want to stay in
sync?), but your reactions are a clue that maybe I imagined a
requirement that doesn't exist.

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Thomas Munro (#32)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Thomas Munro <thomas.munro@gmail.com> writes:

On Thu, Jan 5, 2023 at 11:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

... But if that is the direction
we're going to go in, we should probably revise these APIs to make them
less odd. I'm not sure why we'd keep the REG_CANCEL error code at all.

Ah, OK. I had the impression from the way the code is laid out with a
wall between "PostgreSQL" bits and "vendored library" bits that we
might have some reason to want to keep that callback interface the
same (ie someone else is using this code and we want to stay in
sync?), but your reactions are a clue that maybe I imagined a
requirement that doesn't exist.

The rcancelrequested API is something that I devised out of whole cloth
awhile ago. It's not in Tcl's copy of the code, which AFAIK is the
only other project using this regex engine. I do still have vague
hopes of someday seeing the engine as a standalone project, which is
why I'd prefer to keep a bright line between the engine and Postgres.
But there's no very strong reason to think that any hypothetical future
external users who need a cancel API would want this API as opposed to
one that requires exit() or longjmp() to get out of the engine. So if
we're changing the way we use it, I think it's perfectly reasonable to
redesign that API to make it simpler and less of an impedance mismatch.

regards, tom lane

thomas.munro@gmail.com

almost 3 years ago

In reply to: Tom Lane (#33)

4 attachment(s)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Thu, Jan 5, 2023 at 2:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The rcancelrequested API is something that I devised out of whole cloth
awhile ago. It's not in Tcl's copy of the code, which AFAIK is the
only other project using this regex engine. I do still have vague
hopes of someday seeing the engine as a standalone project, which is
why I'd prefer to keep a bright line between the engine and Postgres.
But there's no very strong reason to think that any hypothetical future
external users who need a cancel API would want this API as opposed to
one that requires exit() or longjmp() to get out of the engine. So if
we're changing the way we use it, I think it's perfectly reasonable to
redesign that API to make it simpler and less of an impedance mismatch.

Thanks for that background. Alright then, here's a new iteration
exploring this direction. It gets rid of CANCEL_REQUESTED() ->
REG_CANCEL and the associated error and callback function, and instead
has just "INTERRUPT(re);" at those cancellation points, which is a
macro that defaults to nothing (for Tcl's benefit). Our regcustom.h
defines it as CHECK_FOR_INTERRUPTS(). I dunno if it's worth passing
the "re" argument... I was imagining that someone who wants to free
memory explicitly and then longjmp would probably need it? (It might
even be possible to expand to something that sets an error and
returns, not investigated.) Better name or design very welcome.

Another decision is to use the no-OOM version of palloc. (Not
explored: could we use throwing palloc with attribute returns_nonnull
to teach GCC and Clang to prune the failure handling from generated
regex code?) (As for STACK_TOO_DEEP(): why follow a function pointer,
when it could be macro-only too? But that's getting off track.)

I split the patch in two: memory and interrupts. I also found a place
in contrib/pg_trgm that did no-longer-needed try/finally.

Attachments:

v5-0001-Use-MemoryContext-API-for-regex-memory-management.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Use-MemoryContext-API-for-regex-memory-management.patchDownload

From 159d3d0fd7894c69f20367771b2100e48e064eec Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 4 Jan 2023 14:15:40 +1300
Subject: [PATCH v5 1/4] Use MemoryContext API for regex memory management.

Previously, regex_t objects' memory was managed with malloc() and free()
directly.  Switch to palloc()-based memory management instead.
Advantages:

 * memory used by cached regexes is now visible with MemoryContext
   observability tools

 * cleanup can be done automatically in certain failure modes
   (something that later commits will take advantage of)

 * cleanup can be done in bulk

On the downside, there may be more fragmentation (wasted memory) due to
per-regex MemoryContext objects.  This is a problem shared with other
cached objects in PostgreSQL and can probably be improved with later
tuning.

Thanks to Noah Misch for suggesting this general approach, which
unblocks later work on interrupts.

Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/utils/adt/regexp.c | 57 ++++++++++++++++++++++++----------
 src/include/regex/regcustom.h  |  6 ++--
 2 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 810dcb85b6..81400ba150 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -96,9 +96,13 @@ typedef struct regexp_matches_ctx
 #define MAX_CACHED_RES	32
 #endif
 
+/* A parent memory context for regular expressions. */
+static MemoryContext RegexpCacheMemoryContext;
+
 /* this structure describes one cached regular expression */
 typedef struct cached_re_str
 {
+	MemoryContext cre_context;	/* memory context for this regexp */
 	char	   *cre_pat;		/* original RE (not null terminated!) */
 	int			cre_pat_len;	/* length of original RE, in bytes */
 	int			cre_flags;		/* compile flags: extended,icase etc */
@@ -145,6 +149,7 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 	int			regcomp_result;
 	cached_re_str re_temp;
 	char		errMsg[100];
+	MemoryContext oldcontext;
 
 	/*
 	 * Look for a match among previously compiled REs.  Since the data
@@ -172,6 +177,13 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 		}
 	}
 
+	/* Set up the cache memory on first go through. */
+	if (unlikely(RegexpCacheMemoryContext == NULL))
+		RegexpCacheMemoryContext =
+			AllocSetContextCreate(TopMemoryContext,
+								  "RegexpCacheMemoryContext",
+								  ALLOCSET_SMALL_SIZES);
+
 	/*
 	 * Couldn't find it, so try to compile the new RE.  To avoid leaking
 	 * resources on failure, we build into the re_temp local.
@@ -183,6 +195,18 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 									   pattern,
 									   text_re_len);
 
+	/*
+	 * Make a memory context for this compiled regexp.  This is initially a
+	 * child of the current memory context, so it will be cleaned up
+	 * automatically if compilation is interrupted and throws an ERROR.
+	 * We'll re-parent it under the longer lived cache context if we make it
+	 * to the bottom of this function.
+	 */
+	re_temp.cre_context = AllocSetContextCreate(CurrentMemoryContext,
+												"RegexpMemoryContext",
+												ALLOCSET_SMALL_SIZES);
+	oldcontext = MemoryContextSwitchTo(re_temp.cre_context);
+
 	regcomp_result = pg_regcomp(&re_temp.cre_re,
 								pattern,
 								pattern_len,
@@ -209,21 +233,17 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 				 errmsg("invalid regular expression: %s", errMsg)));
 	}
 
+	/* Copy the pattern into the per-regexp memory context. */
+	re_temp.cre_pat = palloc(text_re_len + 1);
+	memcpy(re_temp.cre_pat, text_re_val, text_re_len);
+
 	/*
-	 * We use malloc/free for the cre_pat field because the storage has to
-	 * persist across transactions, and because we want to get control back on
-	 * out-of-memory.  The Max() is because some malloc implementations return
-	 * NULL for malloc(0).
+	 * NUL-terminate it only for the benefit of the identifier used for the
+	 * memory context, visible in the pg_backend_memory_contexts view.
 	 */
-	re_temp.cre_pat = malloc(Max(text_re_len, 1));
-	if (re_temp.cre_pat == NULL)
-	{
-		pg_regfree(&re_temp.cre_re);
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory")));
-	}
-	memcpy(re_temp.cre_pat, text_re_val, text_re_len);
+	re_temp.cre_pat[text_re_len] = 0;
+	MemoryContextSetIdentifier(re_temp.cre_context, re_temp.cre_pat);
+
 	re_temp.cre_pat_len = text_re_len;
 	re_temp.cre_flags = cflags;
 	re_temp.cre_collation = collation;
@@ -236,16 +256,21 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 	{
 		--num_res;
 		Assert(num_res < MAX_CACHED_RES);
-		pg_regfree(&re_array[num_res].cre_re);
-		free(re_array[num_res].cre_pat);
+		/* Delete the memory context holding the regexp and pattern. */
+		MemoryContextDelete(re_array[num_res].cre_context);
 	}
 
+	/* Re-parent the memory context to our long-lived cache context. */
+	MemoryContextSetParent(re_temp.cre_context, RegexpCacheMemoryContext);
+
 	if (num_res > 0)
 		memmove(&re_array[1], &re_array[0], num_res * sizeof(cached_re_str));
 
 	re_array[0] = re_temp;
 	num_res++;
 
+	MemoryContextSwitchTo(oldcontext);
+
 	return &re_array[0].cre_re;
 }
 
@@ -1990,7 +2015,7 @@ regexp_fixed_prefix(text *text_re, bool case_insensitive, Oid collation,
 	slen = pg_wchar2mb_with_len(str, result, slen);
 	Assert(slen < maxlen);
 
-	free(str);
+	pfree(str);
 
 	return result;
 }
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h
index fc158e1bb7..8f4025128e 100644
--- a/src/include/regex/regcustom.h
+++ b/src/include/regex/regcustom.h
@@ -49,9 +49,9 @@
 
 /* overrides for regguts.h definitions, if any */
 #define FUNCPTR(name, args) (*name) args
-#define MALLOC(n)		malloc(n)
-#define FREE(p)			free(VS(p))
-#define REALLOC(p,n)	realloc(VS(p),n)
+#define MALLOC(n)		palloc_extended((n), MCXT_ALLOC_NO_OOM)
+#define FREE(p)			pfree(VS(p))
+#define REALLOC(p,n)	repalloc_extended(VS(p),(n), MCXT_ALLOC_NO_OOM)
 #define assert(x)		Assert(x)
 
 /* internal character type and related */
-- 
2.38.1

v5-0002-Update-contrib-trgm_regexp-s-memory-management.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Update-contrib-trgm_regexp-s-memory-management.patchDownload

From 0d86bacd4a0bd1d1e0ba623582735f1d298edaa2 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 14 Jan 2023 13:39:14 +1300
Subject: [PATCH v5 2/4] Update contrib/trgm_regexp's memory management.

While no code change was necessary for this code to keep working, we
don't need to use PG_TRY()/PG_FINALLY() with explicit clean-up while
working with regexes anymore.

Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 contrib/pg_trgm/trgm_regexp.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/contrib/pg_trgm/trgm_regexp.c b/contrib/pg_trgm/trgm_regexp.c
index 9a00564ae4..42f684193d 100644
--- a/contrib/pg_trgm/trgm_regexp.c
+++ b/contrib/pg_trgm/trgm_regexp.c
@@ -549,22 +549,9 @@ createTrgmNFA(text *text_re, Oid collation,
 			   REG_ADVANCED | REG_NOSUB, collation);
 #endif
 
-	/*
-	 * Since the regexp library allocates its internal data structures with
-	 * malloc, we need to use a PG_TRY block to ensure that pg_regfree() gets
-	 * done even if there's an error.
-	 */
-	PG_TRY();
-	{
-		trg = createTrgmNFAInternal(&regex, graph, rcontext);
-	}
-	PG_FINALLY();
-	{
-		pg_regfree(&regex);
-	}
-	PG_END_TRY();
+	trg = createTrgmNFAInternal(&regex, graph, rcontext);
 
-	/* Clean up all the cruft we created */
+	/* Clean up all the cruft we created (including regex) */
 	MemoryContextSwitchTo(oldcontext);
 	MemoryContextDelete(tmpcontext);
 
-- 
2.38.1

v5-0003-Redesign-interrupt-cancel-API-for-regex-engine.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Redesign-interrupt-cancel-API-for-regex-engine.patchDownload

From 9255a06dbb299692ccc56414af538106b1c01fa7 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 13 Jan 2023 22:27:58 +1300
Subject: [PATCH v5 3/4] Redesign interrupt/cancel API for regex engine.

Previously, PostgreSQL's copy of the regex engine had a way to return a
special error code REG_CANCEL if it detected that the next call to
CHECK_FOR_INTERRUPTS() would certainly throw via ereport().

A later bugfix commit will move logic out of signal handlers, so that it
won't run until the next CHECK_FOR_INTERRUPTS(), which makes the above
design impossible unless we split CHECK_FOR_INTERRUPTS() into two
phases, one to run logic and another to ereport().  We may develop such
a system in the future, but for the regex code it is no longer
necessary.

An earlier commit moved regex memory management over to our
MemoryContext system.  Given that the purpose of the purpose of the
two-phase interrupt checking was to free memory before throwing,
somethign we don't need to worry about anymore, it seems simpler to
inject CHECK_FOR_INTERRUPTS() directly into cancelation points, and just
let it throw.

Since other projects using this code might want to stay in sync with us,
do this with a new macro INTERRUPT(), customizable in regcustom.h and
defaulting to nothing.

Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/regex/regc_locale.c          |  6 +--
 src/backend/regex/regc_nfa.c             | 48 ++++--------------------
 src/backend/regex/regcomp.c              | 18 ---------
 src/backend/regex/rege_dfa.c             |  6 +--
 src/backend/regex/regexec.c              |  3 +-
 src/backend/utils/adt/regexp.c           | 11 ------
 src/include/regex/regcustom.h            |  3 +-
 src/include/regex/regerrs.h              |  4 --
 src/include/regex/regex.h                |  1 -
 src/include/regex/regguts.h              |  9 +++--
 src/test/modules/test_regex/test_regex.c |  9 -----
 11 files changed, 18 insertions(+), 100 deletions(-)

diff --git a/src/backend/regex/regc_locale.c b/src/backend/regex/regc_locale.c
index b5f3a73b1b..77d1ce2816 100644
--- a/src/backend/regex/regc_locale.c
+++ b/src/backend/regex/regc_locale.c
@@ -475,11 +475,7 @@ range(struct vars *v,			/* context */
 			}
 			addchr(cv, cc);
 		}
-		if (CANCEL_REQUESTED(v->re))
-		{
-			ERR(REG_CANCEL);
-			return NULL;
-		}
+		INTERRUPT(v->re);
 	}
 
 	return cv;
diff --git a/src/backend/regex/regc_nfa.c b/src/backend/regex/regc_nfa.c
index 60fb0bec5d..f1819a24f6 100644
--- a/src/backend/regex/regc_nfa.c
+++ b/src/backend/regex/regc_nfa.c
@@ -143,11 +143,7 @@ newstate(struct nfa *nfa)
 	 * compilation, since no code path will go very long without making a new
 	 * state or arc.
 	 */
-	if (CANCEL_REQUESTED(nfa->v->re))
-	{
-		NERR(REG_CANCEL);
-		return NULL;
-	}
+	INTERRUPT(nfa->v->re);
 
 	/* first, recycle anything that's on the freelist */
 	if (nfa->freestates != NULL)
@@ -297,11 +293,7 @@ newarc(struct nfa *nfa,
 	 * compilation, since no code path will go very long without making a new
 	 * state or arc.
 	 */
-	if (CANCEL_REQUESTED(nfa->v->re))
-	{
-		NERR(REG_CANCEL);
-		return;
-	}
+	INTERRUPT(nfa->v->re);
 
 	/* check for duplicate arc, using whichever chain is shorter */
 	if (from->nouts <= to->nins)
@@ -825,11 +817,7 @@ moveins(struct nfa *nfa,
 		 * Because we bypass newarc() in this code path, we'd better include a
 		 * cancel check.
 		 */
-		if (CANCEL_REQUESTED(nfa->v->re))
-		{
-			NERR(REG_CANCEL);
-			return;
-		}
+		INTERRUPT(nfa->v->re);
 
 		sortins(nfa, oldState);
 		sortins(nfa, newState);
@@ -929,11 +917,7 @@ copyins(struct nfa *nfa,
 		 * Because we bypass newarc() in this code path, we'd better include a
 		 * cancel check.
 		 */
-		if (CANCEL_REQUESTED(nfa->v->re))
-		{
-			NERR(REG_CANCEL);
-			return;
-		}
+		INTERRUPT(nfa->v->re);
 
 		sortins(nfa, oldState);
 		sortins(nfa, newState);
@@ -1000,11 +984,7 @@ mergeins(struct nfa *nfa,
 	 * Because we bypass newarc() in this code path, we'd better include a
 	 * cancel check.
 	 */
-	if (CANCEL_REQUESTED(nfa->v->re))
-	{
-		NERR(REG_CANCEL);
-		return;
-	}
+	INTERRUPT(nfa->v->re);
 
 	/* Sort existing inarcs as well as proposed new ones */
 	sortins(nfa, s);
@@ -1125,11 +1105,7 @@ moveouts(struct nfa *nfa,
 		 * Because we bypass newarc() in this code path, we'd better include a
 		 * cancel check.
 		 */
-		if (CANCEL_REQUESTED(nfa->v->re))
-		{
-			NERR(REG_CANCEL);
-			return;
-		}
+		INTERRUPT(nfa->v->re);
 
 		sortouts(nfa, oldState);
 		sortouts(nfa, newState);
@@ -1226,11 +1202,7 @@ copyouts(struct nfa *nfa,
 		 * Because we bypass newarc() in this code path, we'd better include a
 		 * cancel check.
 		 */
-		if (CANCEL_REQUESTED(nfa->v->re))
-		{
-			NERR(REG_CANCEL);
-			return;
-		}
+		INTERRUPT(nfa->v->re);
 
 		sortouts(nfa, oldState);
 		sortouts(nfa, newState);
@@ -3282,11 +3254,7 @@ checkmatchall_recurse(struct nfa *nfa, struct state *s, bool **haspaths)
 		return false;
 
 	/* In case the search takes a long time, check for cancel */
-	if (CANCEL_REQUESTED(nfa->v->re))
-	{
-		NERR(REG_CANCEL);
-		return false;
-	}
+	INTERRUPT(nfa->v->re);
 
 	/* Create a haspath array for this state */
 	haspath = (bool *) MALLOC((DUPINF + 2) * sizeof(bool));
diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c
index bb8c240598..8a6cfb2973 100644
--- a/src/backend/regex/regcomp.c
+++ b/src/backend/regex/regcomp.c
@@ -86,7 +86,6 @@ static int	newlacon(struct vars *v, struct state *begin, struct state *end,
 					 int latype);
 static void freelacons(struct subre *subs, int n);
 static void rfree(regex_t *re);
-static int	rcancelrequested(void);
 static int	rstacktoodeep(void);
 
 #ifdef REG_DEBUG
@@ -356,7 +355,6 @@ struct vars
 /* static function list */
 static const struct fns functions = {
 	rfree,						/* regfree insides */
-	rcancelrequested,			/* check for cancel request */
 	rstacktoodeep				/* check for stack getting dangerously deep */
 };
 
@@ -2468,22 +2466,6 @@ rfree(regex_t *re)
 	}
 }
 
-/*
- * rcancelrequested - check for external request to cancel regex operation
- *
- * Return nonzero to fail the operation with error code REG_CANCEL,
- * zero to keep going
- *
- * The current implementation is Postgres-specific.  If we ever get around
- * to splitting the regex code out as a standalone library, there will need
- * to be some API to let applications define a callback function for this.
- */
-static int
-rcancelrequested(void)
-{
-	return InterruptPending && (QueryCancelPending || ProcDiePending);
-}
-
 /*
  * rstacktoodeep - check for stack getting dangerously deep
  *
diff --git a/src/backend/regex/rege_dfa.c b/src/backend/regex/rege_dfa.c
index ba1289c64a..1f8f2ab144 100644
--- a/src/backend/regex/rege_dfa.c
+++ b/src/backend/regex/rege_dfa.c
@@ -805,11 +805,7 @@ miss(struct vars *v,
 	 * Checking for operation cancel in the inner text search loop seems
 	 * unduly expensive.  As a compromise, check during cache misses.
 	 */
-	if (CANCEL_REQUESTED(v->re))
-	{
-		ERR(REG_CANCEL);
-		return NULL;
-	}
+	INTERRUPT(v->re);
 
 	/*
 	 * What set of states would we end up in after consuming the co character?
diff --git a/src/backend/regex/regexec.c b/src/backend/regex/regexec.c
index 3d9ff2e607..2a1d5bebda 100644
--- a/src/backend/regex/regexec.c
+++ b/src/backend/regex/regexec.c
@@ -764,8 +764,7 @@ cdissect(struct vars *v,
 	MDEBUG(("%d: cdissect %c %ld-%ld\n", t->id, t->op, LOFF(begin), LOFF(end)));
 
 	/* handy place to check for operation cancel */
-	if (CANCEL_REQUESTED(v->re))
-		return REG_CANCEL;
+	INTERRUPT(v->re);
 	/* ... and stack overrun */
 	if (STACK_TOO_DEEP(v->re))
 		return REG_ETOOBIG;
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 81400ba150..5da3964746 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -218,15 +218,6 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 	if (regcomp_result != REG_OKAY)
 	{
 		/* re didn't compile (no need for pg_regfree, if so) */
-
-		/*
-		 * Here and in other places in this file, do CHECK_FOR_INTERRUPTS
-		 * before reporting a regex error.  This is so that if the regex
-		 * library aborts and returns REG_CANCEL, we don't print an error
-		 * message that implies the regex was invalid.
-		 */
-		CHECK_FOR_INTERRUPTS();
-
 		pg_regerror(regcomp_result, &re_temp.cre_re, errMsg, sizeof(errMsg));
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
@@ -308,7 +299,6 @@ RE_wchar_execute(regex_t *re, pg_wchar *data, int data_len,
 	if (regexec_result != REG_OKAY && regexec_result != REG_NOMATCH)
 	{
 		/* re failed??? */
-		CHECK_FOR_INTERRUPTS();
 		pg_regerror(regexec_result, re, errMsg, sizeof(errMsg));
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
@@ -2001,7 +1991,6 @@ regexp_fixed_prefix(text *text_re, bool case_insensitive, Oid collation,
 
 		default:
 			/* re failed??? */
-			CHECK_FOR_INTERRUPTS();
 			pg_regerror(re_result, re, errMsg, sizeof(errMsg));
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h
index 8f4025128e..bedee1e9ca 100644
--- a/src/include/regex/regcustom.h
+++ b/src/include/regex/regcustom.h
@@ -44,7 +44,7 @@
 
 #include "mb/pg_wchar.h"
 
-#include "miscadmin.h"			/* needed by rcancelrequested/rstacktoodeep */
+#include "miscadmin.h"			/* needed by stacktoodeep */
 
 
 /* overrides for regguts.h definitions, if any */
@@ -52,6 +52,7 @@
 #define MALLOC(n)		palloc_extended((n), MCXT_ALLOC_NO_OOM)
 #define FREE(p)			pfree(VS(p))
 #define REALLOC(p,n)	repalloc_extended(VS(p),(n), MCXT_ALLOC_NO_OOM)
+#define INTERRUPT(re)	CHECK_FOR_INTERRUPTS()
 #define assert(x)		Assert(x)
 
 /* internal character type and related */
diff --git a/src/include/regex/regerrs.h b/src/include/regex/regerrs.h
index 41e25f7ff0..2c8873eb81 100644
--- a/src/include/regex/regerrs.h
+++ b/src/include/regex/regerrs.h
@@ -81,7 +81,3 @@
 {
 	REG_ECOLORS, "REG_ECOLORS", "too many colors"
 },
-
-{
-	REG_CANCEL, "REG_CANCEL", "operation cancelled"
-},
diff --git a/src/include/regex/regex.h b/src/include/regex/regex.h
index 1297abec62..d08113724f 100644
--- a/src/include/regex/regex.h
+++ b/src/include/regex/regex.h
@@ -156,7 +156,6 @@ typedef struct
 #define REG_BADOPT	18			/* invalid embedded option */
 #define REG_ETOOBIG 19			/* regular expression is too complex */
 #define REG_ECOLORS 20			/* too many colors */
-#define REG_CANCEL	21			/* operation cancelled */
 /* two specials for debugging and testing */
 #define REG_ATOI	101			/* convert error-code name to number */
 #define REG_ITOA	102			/* convert error-code number to name */
diff --git a/src/include/regex/regguts.h b/src/include/regex/regguts.h
index 91a52840c4..3ca3647e11 100644
--- a/src/include/regex/regguts.h
+++ b/src/include/regex/regguts.h
@@ -77,6 +77,11 @@
 #define FREE(p)		free(VS(p))
 #endif
 
+/* interruption */
+#ifndef INTERRUPT
+#define INTERRUPT(re)
+#endif
+
 /* want size of a char in bits, and max value in bounded quantifiers */
 #ifndef _POSIX2_RE_DUP_MAX
 #define _POSIX2_RE_DUP_MAX	255 /* normally from <limits.h> */
@@ -510,13 +515,9 @@ struct subre
 struct fns
 {
 	void		FUNCPTR(free, (regex_t *));
-	int			FUNCPTR(cancel_requested, (void));
 	int			FUNCPTR(stack_too_deep, (void));
 };
 
-#define CANCEL_REQUESTED(re)  \
-	((*((struct fns *) (re)->re_fns)->cancel_requested) ())
-
 #define STACK_TOO_DEEP(re)	\
 	((*((struct fns *) (re)->re_fns)->stack_too_deep) ())
 
diff --git a/src/test/modules/test_regex/test_regex.c b/src/test/modules/test_regex/test_regex.c
index 1d4f79c9d3..54a4f81187 100644
--- a/src/test/modules/test_regex/test_regex.c
+++ b/src/test/modules/test_regex/test_regex.c
@@ -185,15 +185,6 @@ test_re_compile(text *text_re, int cflags, Oid collation,
 	if (regcomp_result != REG_OKAY)
 	{
 		/* re didn't compile (no need for pg_regfree, if so) */
-
-		/*
-		 * Here and in other places in this file, do CHECK_FOR_INTERRUPTS
-		 * before reporting a regex error.  This is so that if the regex
-		 * library aborts and returns REG_CANCEL, we don't print an error
-		 * message that implies the regex was invalid.
-		 */
-		CHECK_FOR_INTERRUPTS();
-
 		pg_regerror(regcomp_result, result_re, errMsg, sizeof(errMsg));
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
-- 
2.38.1

v5-0004-Fix-recovery-conflict-SIGUSR1-handling.patchtext/x-patch; charset=US-ASCII; name=v5-0004-Fix-recovery-conflict-SIGUSR1-handling.patchDownload

From 4dbc21a7516c622fde7dc7cc45aac90fdc559e8f Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 10 May 2022 16:00:23 +1200
Subject: [PATCH v5 4/4] Fix recovery conflict SIGUSR1 handling.

We shouldn't be doing real work in a signal handler, to avoid reaching
code that is not safe in that context.  Move all recovery conflict
checking logic into the next CFI following the standard pattern.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/storage/buffer/bufmgr.c  |   4 +-
 src/backend/storage/ipc/procsignal.c |  12 +-
 src/backend/tcop/postgres.c          | 312 ++++++++++++++-------------
 src/include/storage/procsignal.h     |   4 +-
 src/include/tcop/tcopprot.h          |   3 +-
 5 files changed, 172 insertions(+), 163 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 3fb38a25cf..378f88ce11 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4373,8 +4373,8 @@ LockBufferForCleanup(Buffer buffer)
 }
 
 /*
- * Check called from RecoveryConflictInterrupt handler when Startup
- * process requests cancellation of all pin holders that are blocking it.
+ * Check called from ProcessRecoveryConflictInterrupts() when Startup process
+ * requests cancellation of all pin holders that are blocking it.
  */
 bool
 HoldingBufferPinThatDelaysRecovery(void)
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 395b2cf690..e444296f2c 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -662,22 +662,22 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 		HandleParallelApplyMessageInterrupt();
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_TABLESPACE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
 	SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 470b734e9e..09f564f082 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -157,9 +157,8 @@ static bool EchoQuery = false;	/* -E switch */
 static bool UseSemiNewlineNewline = false;	/* -j switch */
 
 /* whether or not, and why, we were canceled by conflict with recovery */
-static bool RecoveryConflictPending = false;
-static bool RecoveryConflictRetryable = true;
-static ProcSignalReason RecoveryConflictReason;
+static volatile sig_atomic_t RecoveryConflictPending = false;
+static volatile sig_atomic_t RecoveryConflictPendingReasons[NUM_PROCSIGNALS];
 
 /* reused buffer to pass to SendRowDescriptionMessage() */
 static MemoryContext row_description_context = NULL;
@@ -178,7 +177,6 @@ static bool check_log_statement(List *stmt_list);
 static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
-static int	errdetail_recovery_conflict(void);
 static void bind_param_error_callback(void *arg);
 static void start_xact_command(void);
 static void finish_xact_command(void);
@@ -2465,9 +2463,9 @@ errdetail_abort(void)
  * Add an errdetail() line showing conflict source.
  */
 static int
-errdetail_recovery_conflict(void)
+errdetail_recovery_conflict(ProcSignalReason reason)
 {
-	switch (RecoveryConflictReason)
+	switch (reason)
 	{
 		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 			errdetail("User was holding shared buffer pin for too long.");
@@ -2992,137 +2990,190 @@ FloatExceptionHandler(SIGNAL_ARGS)
 }
 
 /*
- * RecoveryConflictInterrupt: out-of-line portion of recovery conflict
- * handling following receipt of SIGUSR1. Designed to be similar to die()
- * and StatementCancelHandler(). Called only by a normal user backend
- * that begins a transaction during recovery.
+ * Tell the next CHECK_FOR_INTERRUPTS() to check for a particular type of
+ * recovery conflict.  Runs in a SIGUSR1 handler.
  */
 void
-RecoveryConflictInterrupt(ProcSignalReason reason)
+HandleRecoveryConflictInterrupt(ProcSignalReason reason)
 {
-	int			save_errno = errno;
+	RecoveryConflictPendingReasons[reason] = true;
+	RecoveryConflictPending = true;
+	InterruptPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
 
-	/*
-	 * Don't joggle the elbow of proc_exit
-	 */
-	if (!proc_exit_inprogress)
+/*
+ * Check one individual conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupt(ProcSignalReason reason)
+{
+	switch (reason)
 	{
-		RecoveryConflictReason = reason;
-		switch (reason)
-		{
-			case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
+		case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
 
-				/*
-				 * If we aren't waiting for a lock we can never deadlock.
-				 */
-				if (!IsWaitingForLock())
-					return;
+			/*
+			 * If we aren't waiting for a lock we can never deadlock.
+			 */
+			if (!IsWaitingForLock())
+				return;
 
-				/* Intentional fall through to check wait for pin */
-				/* FALLTHROUGH */
+			/* Intentional fall through to check wait for pin */
+			/* FALLTHROUGH */
 
-			case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
+		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 
-				/*
-				 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
-				 * aren't blocking the Startup process there is nothing more
-				 * to do.
-				 *
-				 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is
-				 * requested, if we're waiting for locks and the startup
-				 * process is not waiting for buffer pin (i.e., also waiting
-				 * for locks), we set the flag so that ProcSleep() will check
-				 * for deadlocks.
-				 */
-				if (!HoldingBufferPinThatDelaysRecovery())
-				{
-					if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
-						GetStartupBufferPinWaitBufId() < 0)
-						CheckDeadLockAlert();
-					return;
-				}
+			/*
+			 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
+			 * aren't blocking the Startup process there is nothing more to
+			 * do.
+			 *
+			 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is requested,
+			 * if we're waiting for locks and the startup process is not
+			 * waiting for buffer pin (i.e., also waiting for locks), we set
+			 * the flag so that ProcSleep() will check for deadlocks.
+			 */
+			if (!HoldingBufferPinThatDelaysRecovery())
+			{
+				if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
+					GetStartupBufferPinWaitBufId() < 0)
+					CheckDeadLockAlert();
+				return;
+			}
 
-				MyProc->recoveryConflictPending = true;
+			MyProc->recoveryConflictPending = true;
 
-				/* Intentional fall through to error handling */
-				/* FALLTHROUGH */
+			/* Intentional fall through to error handling */
+			/* FALLTHROUGH */
+
+		case PROCSIG_RECOVERY_CONFLICT_LOCK:
+		case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
+		case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
 
-			case PROCSIG_RECOVERY_CONFLICT_LOCK:
-			case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
-			case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+			/*
+			 * If we aren't in a transaction any longer then ignore.
+			 */
+			if (!IsTransactionOrTransactionBlock())
+				return;
 
+			/*
+			 * If we're not in a subtransaction then we are OK to throw an
+			 * ERROR to resolve the conflict.  Otherwise drop through to the
+			 * FATAL case.
+			 *
+			 * XXX other times that we can throw just an ERROR *may* be
+			 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in parent
+			 * transactions
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held by
+			 * parent transactions and the transaction is not
+			 * transaction-snapshot mode
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
+			 * cursors open in parent transactions
+			 */
+			if (!IsSubTransaction())
+			{
 				/*
-				 * If we aren't in a transaction any longer then ignore.
+				 * If we already aborted then we no longer need to cancel.  We
+				 * do this here since we do not wish to ignore aborted
+				 * subtransactions, which must cause FATAL, currently.
 				 */
-				if (!IsTransactionOrTransactionBlock())
+				if (IsAbortedTransactionBlockState())
 					return;
 
 				/*
-				 * If we can abort just the current subtransaction then we are
-				 * OK to throw an ERROR to resolve the conflict. Otherwise
-				 * drop through to the FATAL case.
-				 *
-				 * XXX other times that we can throw just an ERROR *may* be
-				 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in
-				 * parent transactions
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held
-				 * by parent transactions and the transaction is not
-				 * transaction-snapshot mode
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
-				 * cursors open in parent transactions
+				 * If a recovery conflict happens while we are waiting for
+				 * input from the client, the client is presumably just
+				 * sitting idle in a transaction, preventing recovery from
+				 * making progress.  We'll drop through to the FATAL case
+				 * below to dislodge it, in that case.
 				 */
-				if (!IsSubTransaction())
+				if (!DoingCommandRead)
 				{
-					/*
-					 * If we already aborted then we no longer need to cancel.
-					 * We do this here since we do not wish to ignore aborted
-					 * subtransactions, which must cause FATAL, currently.
-					 */
-					if (IsAbortedTransactionBlockState())
+					/* Avoid losing sync in the FE/BE protocol. */
+					if (QueryCancelHoldoffCount != 0)
+					{
+						/*
+						 * Re-arm and defer this interrupt until later.  See
+						 * similar code in ProcessInterrupts().
+						 */
+						RecoveryConflictPendingReasons[reason] = true;
+						RecoveryConflictPending = true;
+						InterruptPending = true;
 						return;
+					}
 
-					RecoveryConflictPending = true;
-					QueryCancelPending = true;
-					InterruptPending = true;
+					/*
+					 * We are cleared to throw an ERROR.  We have a top-level
+					 * transaction that we can abort and a conflict that isn't
+					 * inherently non-retryable.
+					 */
+					LockErrorCleanup();
+					pgstat_report_recovery_conflict(reason);
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("canceling statement due to conflict with recovery"),
+							 errdetail_recovery_conflict(reason)));
 					break;
 				}
+			}
 
-				/* Intentional fall through to session cancel */
-				/* FALLTHROUGH */
-
-			case PROCSIG_RECOVERY_CONFLICT_DATABASE:
-				RecoveryConflictPending = true;
-				ProcDiePending = true;
-				InterruptPending = true;
-				break;
+			/* Intentional fall through to session cancel */
+			/* FALLTHROUGH */
 
-			default:
-				elog(FATAL, "unrecognized conflict mode: %d",
-					 (int) reason);
-		}
+		case PROCSIG_RECOVERY_CONFLICT_DATABASE:
 
-		Assert(RecoveryConflictPending && (QueryCancelPending || ProcDiePending));
+			/*
+			 * Retrying is not possible because the database is dropped, or we
+			 * decided above that we couldn't resolve the conflict with an
+			 * ERROR and fell through.  Terminate the session.
+			 */
+			pgstat_report_recovery_conflict(reason);
+			ereport(FATAL,
+					(errcode(reason == PROCSIG_RECOVERY_CONFLICT_DATABASE ?
+							 ERRCODE_DATABASE_DROPPED :
+							 ERRCODE_T_R_SERIALIZATION_FAILURE),
+					 errmsg("terminating connection due to conflict with recovery"),
+					 errdetail_recovery_conflict(reason),
+					 errhint("In a moment you should be able to reconnect to the"
+							 " database and repeat your command.")));
+			break;
 
-		/*
-		 * All conflicts apart from database cause dynamic errors where the
-		 * command or transaction can be retried at a later point with some
-		 * potential for success. No need to reset this, since non-retryable
-		 * conflict errors are currently FATAL.
-		 */
-		if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
-			RecoveryConflictRetryable = false;
+		default:
+			elog(FATAL, "unrecognized conflict mode: %d", (int) reason);
 	}
+}
+
+/*
+ * Check each possible recovery conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupts(void)
+{
+	ProcSignalReason reason;
 
 	/*
-	 * Set the process latch. This function essentially emulates signal
-	 * handlers like die() and StatementCancelHandler() and it seems prudent
-	 * to behave similarly as they do.
+	 * We don't need to worry about joggling the elbow of proc_exit, because
+	 * proc_exit_prepare() holds interrupts, so ProcessInterrupts() won't call
+	 * us.
 	 */
-	SetLatch(MyLatch);
+	Assert(!proc_exit_inprogress);
+	Assert(InterruptHoldoffCount == 0);
+	Assert(RecoveryConflictPending);
 
-	errno = save_errno;
+	RecoveryConflictPending = false;
+
+	for (reason = PROCSIG_RECOVERY_CONFLICT_FIRST;
+		 reason <= PROCSIG_RECOVERY_CONFLICT_LAST;
+		 reason++)
+	{
+		if (RecoveryConflictPendingReasons[reason])
+		{
+			RecoveryConflictPendingReasons[reason] = false;
+			ProcessRecoveryConflictInterrupt(reason);
+		}
+	}
 }
 
 /*
@@ -3177,24 +3228,6 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
-		else if (RecoveryConflictPending && RecoveryConflictRetryable)
-		{
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
-		else if (RecoveryConflictPending)
-		{
-			/* Currently there is only one non-retryable recovery conflict */
-			Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_DATABASE_DROPPED),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 		else if (IsBackgroundWorker)
 			ereport(FATAL,
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
@@ -3237,31 +3270,13 @@ ProcessInterrupts(void)
 				 errmsg("connection to client lost")));
 	}
 
-	/*
-	 * If a recovery conflict happens while we are waiting for input from the
-	 * client, the client is presumably just sitting idle in a transaction,
-	 * preventing recovery from making progress.  Terminate the connection to
-	 * dislodge it.
-	 */
-	if (RecoveryConflictPending && DoingCommandRead)
-	{
-		QueryCancelPending = false; /* this trumps QueryCancel */
-		RecoveryConflictPending = false;
-		LockErrorCleanup();
-		pgstat_report_recovery_conflict(RecoveryConflictReason);
-		ereport(FATAL,
-				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-				 errmsg("terminating connection due to conflict with recovery"),
-				 errdetail_recovery_conflict(),
-				 errhint("In a moment you should be able to reconnect to the"
-						 " database and repeat your command.")));
-	}
-
 	/*
 	 * Don't allow query cancel interrupts while reading input from the
 	 * client, because we might lose sync in the FE/BE protocol.  (Die
 	 * interrupts are OK, because we won't read any further messages from the
 	 * client in that case.)
+	 *
+	 * See similar logic in ProcessRecoveryConflictInterrupts().
 	 */
 	if (QueryCancelPending && QueryCancelHoldoffCount != 0)
 	{
@@ -3320,16 +3335,6 @@ ProcessInterrupts(void)
 					(errcode(ERRCODE_QUERY_CANCELED),
 					 errmsg("canceling autovacuum task")));
 		}
-		if (RecoveryConflictPending)
-		{
-			RecoveryConflictPending = false;
-			LockErrorCleanup();
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(ERROR,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("canceling statement due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 
 		/*
 		 * If we are reading a command from the client, just ignore the cancel
@@ -3345,6 +3350,9 @@ ProcessInterrupts(void)
 		}
 	}
 
+	if (RecoveryConflictPending)
+		ProcessRecoveryConflictInterrupts();
+
 	if (IdleInTransactionSessionTimeoutPending)
 	{
 		/*
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 905af2231b..6ef7298294 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -38,12 +38,14 @@ typedef enum
 	PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
 
 	/* Recovery conflict reasons */
-	PROCSIG_RECOVERY_CONFLICT_DATABASE,
+	PROCSIG_RECOVERY_CONFLICT_FIRST,
+	PROCSIG_RECOVERY_CONFLICT_DATABASE = PROCSIG_RECOVERY_CONFLICT_FIRST,
 	PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
 	PROCSIG_RECOVERY_CONFLICT_LOCK,
 	PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+	PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index abd7b4fff3..ab43b638ee 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -70,8 +70,7 @@ extern void die(SIGNAL_ARGS);
 extern void quickdie(SIGNAL_ARGS) pg_attribute_noreturn();
 extern void StatementCancelHandler(SIGNAL_ARGS);
 extern void FloatExceptionHandler(SIGNAL_ARGS) pg_attribute_noreturn();
-extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
-																 * handler */
+extern void HandleRecoveryConflictInterrupt(ProcSignalReason reason);
 extern void ProcessClientReadInterrupt(bool blocked);
 extern void ProcessClientWriteInterrupt(bool blocked);
 
-- 
2.38.1

thomas.munro@gmail.com

almost 3 years ago

In reply to: Thomas Munro (#34)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Sat, Jan 14, 2023 at 3:23 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Jan 5, 2023 at 2:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The rcancelrequested API is something that I devised out of whole cloth
awhile ago. It's not in Tcl's copy of the code, which AFAIK is the
only other project using this regex engine. I do still have vague
hopes of someday seeing the engine as a standalone project, which is
why I'd prefer to keep a bright line between the engine and Postgres.
But there's no very strong reason to think that any hypothetical future
external users who need a cancel API would want this API as opposed to
one that requires exit() or longjmp() to get out of the engine. So if
we're changing the way we use it, I think it's perfectly reasonable to
redesign that API to make it simpler and less of an impedance mismatch.

Thanks for that background. Alright then, here's a new iteration
exploring this direction. It gets rid of CANCEL_REQUESTED() ->
REG_CANCEL and the associated error and callback function, and instead
has just "INTERRUPT(re);" at those cancellation points, which is a
macro that defaults to nothing (for Tcl's benefit). Our regcustom.h
defines it as CHECK_FOR_INTERRUPTS(). I dunno if it's worth passing
the "re" argument... I was imagining that someone who wants to free
memory explicitly and then longjmp would probably need it? (It might
even be possible to expand to something that sets an error and
returns, not investigated.) Better name or design very welcome.

I think this experiment worked out pretty well. I think it's a nice
side-effect that you can see what memory the regexp subsystem is
using, and that's likely to lead to more improvements. (Why is it
limited to caching 32 entries? Why is it a linear search, not a hash
table? Why is LRU implemented with memmove() and not a list? Could
we have a GUC regex_cache_memory, so someone who uses a lot of regexes
can opt into a large cache?) On the other hand it also uses a bit
more RAM, like other code using the reparenting trick, which is a
topic for future research.

I vote for proceeding with this approach. I wish we didn't have to
tackle either a regexp interface/management change (done here) or a
CFI() redesign (not done, but probably also a good idea for other
reasons) before getting this signal stuff straightened out, but here
we are. This approach seems good to me. Anyone have a different
take?

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Thomas Munro (#35)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Thomas Munro <thomas.munro@gmail.com> writes:

I think this experiment worked out pretty well. I think it's a nice
side-effect that you can see what memory the regexp subsystem is
using, and that's likely to lead to more improvements. (Why is it
limited to caching 32 entries? Why is it a linear search, not a hash
table? Why is LRU implemented with memmove() and not a list? Could
we have a GUC regex_cache_memory, so someone who uses a lot of regexes
can opt into a large cache?) On the other hand it also uses a bit
more RAM, like other code using the reparenting trick, which is a
topic for future research.

I vote for proceeding with this approach. I wish we didn't have to
tackle either a regexp interface/management change (done here) or a
CFI() redesign (not done, but probably also a good idea for other
reasons) before getting this signal stuff straightened out, but here
we are. This approach seems good to me. Anyone have a different
take?

Sorry for not looking at this sooner. I am okay with the regex
changes proposed in v5-0001 through 0003, but I think you need to
take another mopup pass there. Some specific complaints:
* header comment for pg_regprefix has been falsified (s/malloc/palloc/)
* in spell.c, regex_affix_deletion_callback could be got rid of
* check other callers of pg_regerror for now-useless CHECK_FOR_INTERRUPTS

In general there's a lot of comments referring to regexes being malloc'd.
I'm disinclined to change the ones inside the engine, because as far as
it knows it is still using malloc, but maybe we should work harder on
our own comments. In particular, it'd likely be useful to have something
somewhere pointing out that pg_regfree is only needed when you can't
get rid of the regex by context cleanup. Maybe write a short section
about memory management in backend/regex/README?

I've not really looked at 0004.

regards, tom lane

thomas.munro@gmail.com

almost 3 years ago

In reply to: Tom Lane (#36)

5 attachment(s)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Tue, Apr 4, 2023 at 1:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Sorry for not looking at this sooner. I am okay with the regex
changes proposed in v5-0001 through 0003, but I think you need to
take another mopup pass there. Some specific complaints:
* header comment for pg_regprefix has been falsified (s/malloc/palloc/)

Thanks. Fixed.

* in spell.c, regex_affix_deletion_callback could be got rid of

Done in a separate patch. I wondered if regex_t should be included
directly as a member of that union inside AFFIX, but decided it should
keep using a pointer (just without the extra wrapper struct). A
direct member would make the AFFIX slightly larger, and it would
require us to assume that regex_t is movable which it probably
actually is in practice I guess but that isn't written down anywhere
and it seemed strange to rely on it.

* check other callers of pg_regerror for now-useless CHECK_FOR_INTERRUPTS

I found three of these to remove (jsonpath_gram.y, varlena.c, test_regex.c).

In general there's a lot of comments referring to regexes being malloc'd.

There is also some remaining direct use of malloc() in
regc_pg_locale.c because "we mustn't lose control on out-of-memory".
At that time (2012) there was no MCXT_NO_OOM (2015), so we could
presumably bring that cache into an observable MemoryContext now too.
I haven't written a patch for that, though, because it's not in the
way of my recovery conflict mission.

I'm disinclined to change the ones inside the engine, because as far as
it knows it is still using malloc, but maybe we should work harder on
our own comments. In particular, it'd likely be useful to have something
somewhere pointing out that pg_regfree is only needed when you can't
get rid of the regex by context cleanup. Maybe write a short section
about memory management in backend/regex/README?

I'll try to write something for the README tomorrow. Here's a new
version of the code changes.

I've not really looked at 0004.

I'm hoping to get just the regex changes in ASAP, and then take a
little bit longer on the recovery conflict patch itself (v6-0005) on
the basis that it's bugfix work and not subject to the feature freeze.

Attachments:

v6-0001-Use-MemoryContext-API-for-regex-memory-management.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Use-MemoryContext-API-for-regex-memory-management.patchDownload

From a21a43bf5b1ba073abb3238968b9f8d13b1b318a Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 4 Jan 2023 14:15:40 +1300
Subject: [PATCH v6 1/5] Use MemoryContext API for regex memory management.

Previously, regex_t objects' memory was managed with malloc() and free()
directly.  Switch to palloc()-based memory management instead.
Advantages:

 * memory used by cached regexes is now visible with MemoryContext
   observability tools

 * cleanup can be done automatically in certain failure modes
   (something that later commits will take advantage of)

 * cleanup can be done in bulk

On the downside, there may be more fragmentation (wasted memory) due to
per-regex MemoryContext objects.  This is a problem shared with other
cached objects in PostgreSQL and can probably be improved with later
tuning.

Thanks to Noah Misch for suggesting this general approach, which
unblocks later work on interrupts.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/regex/regprefix.c  |  2 +-
 src/backend/utils/adt/regexp.c | 57 ++++++++++++++++++++++++----------
 src/include/regex/regcustom.h  |  6 ++--
 3 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/src/backend/regex/regprefix.c b/src/backend/regex/regprefix.c
index 221f02da63..c09b2a9778 100644
--- a/src/backend/regex/regprefix.c
+++ b/src/backend/regex/regprefix.c
@@ -32,7 +32,7 @@ static int	findprefix(struct cnfa *cnfa, struct colormap *cm,
  *	REG_EXACT: all strings satisfying the regex must match the same string
  *	or a REG_XXX error code
  *
- * In the non-failure cases, *string is set to a malloc'd string containing
+ * In the non-failure cases, *string is set to a palloc'd string containing
  * the common prefix or exact value, of length *slength (measured in chrs
  * not bytes!).
  *
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 810dcb85b6..81400ba150 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -96,9 +96,13 @@ typedef struct regexp_matches_ctx
 #define MAX_CACHED_RES	32
 #endif
 
+/* A parent memory context for regular expressions. */
+static MemoryContext RegexpCacheMemoryContext;
+
 /* this structure describes one cached regular expression */
 typedef struct cached_re_str
 {
+	MemoryContext cre_context;	/* memory context for this regexp */
 	char	   *cre_pat;		/* original RE (not null terminated!) */
 	int			cre_pat_len;	/* length of original RE, in bytes */
 	int			cre_flags;		/* compile flags: extended,icase etc */
@@ -145,6 +149,7 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 	int			regcomp_result;
 	cached_re_str re_temp;
 	char		errMsg[100];
+	MemoryContext oldcontext;
 
 	/*
 	 * Look for a match among previously compiled REs.  Since the data
@@ -172,6 +177,13 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 		}
 	}
 
+	/* Set up the cache memory on first go through. */
+	if (unlikely(RegexpCacheMemoryContext == NULL))
+		RegexpCacheMemoryContext =
+			AllocSetContextCreate(TopMemoryContext,
+								  "RegexpCacheMemoryContext",
+								  ALLOCSET_SMALL_SIZES);
+
 	/*
 	 * Couldn't find it, so try to compile the new RE.  To avoid leaking
 	 * resources on failure, we build into the re_temp local.
@@ -183,6 +195,18 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 									   pattern,
 									   text_re_len);
 
+	/*
+	 * Make a memory context for this compiled regexp.  This is initially a
+	 * child of the current memory context, so it will be cleaned up
+	 * automatically if compilation is interrupted and throws an ERROR.
+	 * We'll re-parent it under the longer lived cache context if we make it
+	 * to the bottom of this function.
+	 */
+	re_temp.cre_context = AllocSetContextCreate(CurrentMemoryContext,
+												"RegexpMemoryContext",
+												ALLOCSET_SMALL_SIZES);
+	oldcontext = MemoryContextSwitchTo(re_temp.cre_context);
+
 	regcomp_result = pg_regcomp(&re_temp.cre_re,
 								pattern,
 								pattern_len,
@@ -209,21 +233,17 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 				 errmsg("invalid regular expression: %s", errMsg)));
 	}
 
+	/* Copy the pattern into the per-regexp memory context. */
+	re_temp.cre_pat = palloc(text_re_len + 1);
+	memcpy(re_temp.cre_pat, text_re_val, text_re_len);
+
 	/*
-	 * We use malloc/free for the cre_pat field because the storage has to
-	 * persist across transactions, and because we want to get control back on
-	 * out-of-memory.  The Max() is because some malloc implementations return
-	 * NULL for malloc(0).
+	 * NUL-terminate it only for the benefit of the identifier used for the
+	 * memory context, visible in the pg_backend_memory_contexts view.
 	 */
-	re_temp.cre_pat = malloc(Max(text_re_len, 1));
-	if (re_temp.cre_pat == NULL)
-	{
-		pg_regfree(&re_temp.cre_re);
-		ereport(ERROR,
-				(errcode(ERRCODE_OUT_OF_MEMORY),
-				 errmsg("out of memory")));
-	}
-	memcpy(re_temp.cre_pat, text_re_val, text_re_len);
+	re_temp.cre_pat[text_re_len] = 0;
+	MemoryContextSetIdentifier(re_temp.cre_context, re_temp.cre_pat);
+
 	re_temp.cre_pat_len = text_re_len;
 	re_temp.cre_flags = cflags;
 	re_temp.cre_collation = collation;
@@ -236,16 +256,21 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 	{
 		--num_res;
 		Assert(num_res < MAX_CACHED_RES);
-		pg_regfree(&re_array[num_res].cre_re);
-		free(re_array[num_res].cre_pat);
+		/* Delete the memory context holding the regexp and pattern. */
+		MemoryContextDelete(re_array[num_res].cre_context);
 	}
 
+	/* Re-parent the memory context to our long-lived cache context. */
+	MemoryContextSetParent(re_temp.cre_context, RegexpCacheMemoryContext);
+
 	if (num_res > 0)
 		memmove(&re_array[1], &re_array[0], num_res * sizeof(cached_re_str));
 
 	re_array[0] = re_temp;
 	num_res++;
 
+	MemoryContextSwitchTo(oldcontext);
+
 	return &re_array[0].cre_re;
 }
 
@@ -1990,7 +2015,7 @@ regexp_fixed_prefix(text *text_re, bool case_insensitive, Oid collation,
 	slen = pg_wchar2mb_with_len(str, result, slen);
 	Assert(slen < maxlen);
 
-	free(str);
+	pfree(str);
 
 	return result;
 }
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h
index fc158e1bb7..8f4025128e 100644
--- a/src/include/regex/regcustom.h
+++ b/src/include/regex/regcustom.h
@@ -49,9 +49,9 @@
 
 /* overrides for regguts.h definitions, if any */
 #define FUNCPTR(name, args) (*name) args
-#define MALLOC(n)		malloc(n)
-#define FREE(p)			free(VS(p))
-#define REALLOC(p,n)	realloc(VS(p),n)
+#define MALLOC(n)		palloc_extended((n), MCXT_ALLOC_NO_OOM)
+#define FREE(p)			pfree(VS(p))
+#define REALLOC(p,n)	repalloc_extended(VS(p),(n), MCXT_ALLOC_NO_OOM)
 #define assert(x)		Assert(x)
 
 /* internal character type and related */
-- 
2.39.2

v6-0002-Update-tsearch-regex-memory-management.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Update-tsearch-regex-memory-management.patchDownload

From 6dde972500a352a382072c22f2750a351b6bed6b Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 4 Apr 2023 11:20:15 +1200
Subject: [PATCH v6 2/5] Update tsearch regex memory management.

Now that our regex engine uses palloc(), it's not necessary to set up a
special memory context callback to free compiled regexes.  The regex has
no resources other than the memory that is already going to be freed in
bulk.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/tsearch/spell.c       | 34 +++++++------------------------
 src/include/tsearch/dicts/spell.h | 18 ++++++----------
 2 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index 8d48cad251..83a1836b44 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -655,17 +655,6 @@ FindWord(IspellDict *Conf, const char *word, const char *affixflag, int flag)
 	return 0;
 }
 
-/*
- * Context reset/delete callback for a regular expression used in an affix
- */
-static void
-regex_affix_deletion_callback(void *arg)
-{
-	aff_regex_struct *pregex = (aff_regex_struct *) arg;
-
-	pg_regfree(&(pregex->regex));
-}
-
 /*
  * Adds a new affix rule to the Affix field.
  *
@@ -728,7 +717,6 @@ NIAddAffix(IspellDict *Conf, const char *flag, char flagflags, const char *mask,
 		int			err;
 		pg_wchar   *wmask;
 		char	   *tmask;
-		aff_regex_struct *pregex;
 
 		Affix->issimple = 0;
 		Affix->isregis = 0;
@@ -743,31 +731,23 @@ NIAddAffix(IspellDict *Conf, const char *flag, char flagflags, const char *mask,
 		wmasklen = pg_mb2wchar_with_len(tmask, wmask, masklen);
 
 		/*
-		 * The regex engine stores its stuff using malloc not palloc, so we
-		 * must arrange to explicitly clean up the regex when the dictionary's
-		 * context is cleared.  That means the regex_t has to stay in a fixed
-		 * location within the context; we can't keep it directly in the AFFIX
-		 * struct, since we may sort and resize the array of AFFIXes.
+		 * The regex and all internal state created by pg_regcomp are allocated
+		 * in the dictionary's memory context, and will be freed automatically
+		 * when it is destroyed.
 		 */
-		Affix->reg.pregex = pregex = palloc(sizeof(aff_regex_struct));
-
-		err = pg_regcomp(&(pregex->regex), wmask, wmasklen,
+		Affix->reg.pregex = palloc(sizeof(regex_t));
+		err = pg_regcomp(Affix->reg.pregex, wmask, wmasklen,
 						 REG_ADVANCED | REG_NOSUB,
 						 DEFAULT_COLLATION_OID);
 		if (err)
 		{
 			char		errstr[100];
 
-			pg_regerror(err, &(pregex->regex), errstr, sizeof(errstr));
+			pg_regerror(err, Affix->reg.pregex, errstr, sizeof(errstr));
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
 					 errmsg("invalid regular expression: %s", errstr)));
 		}
-
-		pregex->mcallback.func = regex_affix_deletion_callback;
-		pregex->mcallback.arg = (void *) pregex;
-		MemoryContextRegisterResetCallback(CurrentMemoryContext,
-										   &pregex->mcallback);
 	}
 
 	Affix->flagflags = flagflags;
@@ -2161,7 +2141,7 @@ CheckAffix(const char *word, size_t len, AFFIX *Affix, int flagflags, char *neww
 		data = (pg_wchar *) palloc((newword_len + 1) * sizeof(pg_wchar));
 		data_len = pg_mb2wchar_with_len(newword, data, newword_len);
 
-		if (pg_regexec(&(Affix->reg.pregex->regex), data, data_len,
+		if (pg_regexec(Affix->reg.pregex, data, data_len,
 					   0, NULL, 0, NULL, 0) == REG_OKAY)
 		{
 			pfree(data);
diff --git a/src/include/tsearch/dicts/spell.h b/src/include/tsearch/dicts/spell.h
index 5c30af6ac6..82203edb31 100644
--- a/src/include/tsearch/dicts/spell.h
+++ b/src/include/tsearch/dicts/spell.h
@@ -81,17 +81,6 @@ typedef struct spell_struct
 
 #define SPELLHDRSZ	(offsetof(SPELL, word))
 
-/*
- * If an affix uses a regex, we have to store that separately in a struct
- * that won't move around when arrays of affixes are enlarged or sorted.
- * This is so that it can be found to be cleaned up at context destruction.
- */
-typedef struct aff_regex_struct
-{
-	regex_t		regex;
-	MemoryContextCallback mcallback;
-} aff_regex_struct;
-
 /*
  * Represents an entry in an affix list.
  */
@@ -108,7 +97,12 @@ typedef struct aff_struct
 	char	   *repl;
 	union
 	{
-		aff_regex_struct *pregex;
+		/*
+		 * Arrays of AFFIX are moved and sorted.  We'll use a pointer to
+		 * regex_t to keep this struct small, and avoid assuming that regex_t
+		 * is movable.
+		 */
+		regex_t	   *pregex;
 		Regis		regis;
 	}			reg;
 } AFFIX;
-- 
2.39.2

v6-0003-Update-contrib-trgm_regexp-s-memory-management.patchtext/x-patch; charset=US-ASCII; name=v6-0003-Update-contrib-trgm_regexp-s-memory-management.patchDownload

From e5cf8d820668c617993a667c847d56093bfc9686 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 14 Jan 2023 13:39:14 +1300
Subject: [PATCH v6 3/5] Update contrib/trgm_regexp's memory management.

While no code change was necessary for this code to keep working, we
don't need to use PG_TRY()/PG_FINALLY() with explicit clean-up while
working with regexes anymore.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 contrib/pg_trgm/trgm_regexp.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/contrib/pg_trgm/trgm_regexp.c b/contrib/pg_trgm/trgm_regexp.c
index 06cd3db67b..1d36946067 100644
--- a/contrib/pg_trgm/trgm_regexp.c
+++ b/contrib/pg_trgm/trgm_regexp.c
@@ -549,22 +549,9 @@ createTrgmNFA(text *text_re, Oid collation,
 			   REG_ADVANCED | REG_NOSUB, collation);
 #endif
 
-	/*
-	 * Since the regexp library allocates its internal data structures with
-	 * malloc, we need to use a PG_TRY block to ensure that pg_regfree() gets
-	 * done even if there's an error.
-	 */
-	PG_TRY();
-	{
-		trg = createTrgmNFAInternal(&regex, graph, rcontext);
-	}
-	PG_FINALLY();
-	{
-		pg_regfree(&regex);
-	}
-	PG_END_TRY();
+	trg = createTrgmNFAInternal(&regex, graph, rcontext);
 
-	/* Clean up all the cruft we created */
+	/* Clean up all the cruft we created (including regex) */
 	MemoryContextSwitchTo(oldcontext);
 	MemoryContextDelete(tmpcontext);
 
-- 
2.39.2

v6-0004-Redesign-interrupt-cancel-API-for-regex-engine.patchtext/x-patch; charset=US-ASCII; name=v6-0004-Redesign-interrupt-cancel-API-for-regex-engine.patchDownload

From 0d94e7ee4ada907337bb592bb1d8bd391538e3ea Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Fri, 13 Jan 2023 22:27:58 +1300
Subject: [PATCH v6 4/5] Redesign interrupt/cancel API for regex engine.

Previously, PostgreSQL's copy of the regex engine had a way to return a
special error code REG_CANCEL if it detected that the next call to
CHECK_FOR_INTERRUPTS() would certainly throw via ereport().

A later bugfix commit will move logic out of signal handlers, so that it
won't run until the next CHECK_FOR_INTERRUPTS(), which makes the above
design impossible unless we split CHECK_FOR_INTERRUPTS() into two
phases, one to run logic and another to ereport().  We may develop such
a system in the future, but for the regex code it is no longer
necessary.

An earlier commit moved regex memory management over to our
MemoryContext system.  Given that the purpose of the two-phase interrupt
checking was to free memory before throwing, something we don't need to
worry about anymore, it seems simpler to inject CHECK_FOR_INTERRUPTS()
directly into cancelation points, and just let it throw.

Since other projects using this code might want to stay in sync with us,
do this with a new macro INTERRUPT(), customizable in regcustom.h and
defaulting to nothing.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/regex/regc_locale.c          |  6 +--
 src/backend/regex/regc_nfa.c             | 48 ++++--------------------
 src/backend/regex/regcomp.c              | 18 ---------
 src/backend/regex/rege_dfa.c             |  6 +--
 src/backend/regex/regexec.c              |  3 +-
 src/backend/utils/adt/jsonpath_gram.y    |  2 -
 src/backend/utils/adt/regexp.c           | 11 ------
 src/backend/utils/adt/varlena.c          |  1 -
 src/include/regex/regcustom.h            |  3 +-
 src/include/regex/regerrs.h              |  4 --
 src/include/regex/regex.h                |  1 -
 src/include/regex/regguts.h              |  9 +++--
 src/test/modules/test_regex/test_regex.c | 10 -----
 13 files changed, 18 insertions(+), 104 deletions(-)

diff --git a/src/backend/regex/regc_locale.c b/src/backend/regex/regc_locale.c
index b5f3a73b1b..77d1ce2816 100644
--- a/src/backend/regex/regc_locale.c
+++ b/src/backend/regex/regc_locale.c
@@ -475,11 +475,7 @@ range(struct vars *v,			/* context */
 			}
 			addchr(cv, cc);
 		}
-		if (CANCEL_REQUESTED(v->re))
-		{
-			ERR(REG_CANCEL);
-			return NULL;
-		}
+		INTERRUPT(v->re);
 	}
 
 	return cv;
diff --git a/src/backend/regex/regc_nfa.c b/src/backend/regex/regc_nfa.c
index 60fb0bec5d..f1819a24f6 100644
--- a/src/backend/regex/regc_nfa.c
+++ b/src/backend/regex/regc_nfa.c
@@ -143,11 +143,7 @@ newstate(struct nfa *nfa)
 	 * compilation, since no code path will go very long without making a new
 	 * state or arc.
 	 */
-	if (CANCEL_REQUESTED(nfa->v->re))
-	{
-		NERR(REG_CANCEL);
-		return NULL;
-	}
+	INTERRUPT(nfa->v->re);
 
 	/* first, recycle anything that's on the freelist */
 	if (nfa->freestates != NULL)
@@ -297,11 +293,7 @@ newarc(struct nfa *nfa,
 	 * compilation, since no code path will go very long without making a new
 	 * state or arc.
 	 */
-	if (CANCEL_REQUESTED(nfa->v->re))
-	{
-		NERR(REG_CANCEL);
-		return;
-	}
+	INTERRUPT(nfa->v->re);
 
 	/* check for duplicate arc, using whichever chain is shorter */
 	if (from->nouts <= to->nins)
@@ -825,11 +817,7 @@ moveins(struct nfa *nfa,
 		 * Because we bypass newarc() in this code path, we'd better include a
 		 * cancel check.
 		 */
-		if (CANCEL_REQUESTED(nfa->v->re))
-		{
-			NERR(REG_CANCEL);
-			return;
-		}
+		INTERRUPT(nfa->v->re);
 
 		sortins(nfa, oldState);
 		sortins(nfa, newState);
@@ -929,11 +917,7 @@ copyins(struct nfa *nfa,
 		 * Because we bypass newarc() in this code path, we'd better include a
 		 * cancel check.
 		 */
-		if (CANCEL_REQUESTED(nfa->v->re))
-		{
-			NERR(REG_CANCEL);
-			return;
-		}
+		INTERRUPT(nfa->v->re);
 
 		sortins(nfa, oldState);
 		sortins(nfa, newState);
@@ -1000,11 +984,7 @@ mergeins(struct nfa *nfa,
 	 * Because we bypass newarc() in this code path, we'd better include a
 	 * cancel check.
 	 */
-	if (CANCEL_REQUESTED(nfa->v->re))
-	{
-		NERR(REG_CANCEL);
-		return;
-	}
+	INTERRUPT(nfa->v->re);
 
 	/* Sort existing inarcs as well as proposed new ones */
 	sortins(nfa, s);
@@ -1125,11 +1105,7 @@ moveouts(struct nfa *nfa,
 		 * Because we bypass newarc() in this code path, we'd better include a
 		 * cancel check.
 		 */
-		if (CANCEL_REQUESTED(nfa->v->re))
-		{
-			NERR(REG_CANCEL);
-			return;
-		}
+		INTERRUPT(nfa->v->re);
 
 		sortouts(nfa, oldState);
 		sortouts(nfa, newState);
@@ -1226,11 +1202,7 @@ copyouts(struct nfa *nfa,
 		 * Because we bypass newarc() in this code path, we'd better include a
 		 * cancel check.
 		 */
-		if (CANCEL_REQUESTED(nfa->v->re))
-		{
-			NERR(REG_CANCEL);
-			return;
-		}
+		INTERRUPT(nfa->v->re);
 
 		sortouts(nfa, oldState);
 		sortouts(nfa, newState);
@@ -3282,11 +3254,7 @@ checkmatchall_recurse(struct nfa *nfa, struct state *s, bool **haspaths)
 		return false;
 
 	/* In case the search takes a long time, check for cancel */
-	if (CANCEL_REQUESTED(nfa->v->re))
-	{
-		NERR(REG_CANCEL);
-		return false;
-	}
+	INTERRUPT(nfa->v->re);
 
 	/* Create a haspath array for this state */
 	haspath = (bool *) MALLOC((DUPINF + 2) * sizeof(bool));
diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c
index bb8c240598..8a6cfb2973 100644
--- a/src/backend/regex/regcomp.c
+++ b/src/backend/regex/regcomp.c
@@ -86,7 +86,6 @@ static int	newlacon(struct vars *v, struct state *begin, struct state *end,
 					 int latype);
 static void freelacons(struct subre *subs, int n);
 static void rfree(regex_t *re);
-static int	rcancelrequested(void);
 static int	rstacktoodeep(void);
 
 #ifdef REG_DEBUG
@@ -356,7 +355,6 @@ struct vars
 /* static function list */
 static const struct fns functions = {
 	rfree,						/* regfree insides */
-	rcancelrequested,			/* check for cancel request */
 	rstacktoodeep				/* check for stack getting dangerously deep */
 };
 
@@ -2468,22 +2466,6 @@ rfree(regex_t *re)
 	}
 }
 
-/*
- * rcancelrequested - check for external request to cancel regex operation
- *
- * Return nonzero to fail the operation with error code REG_CANCEL,
- * zero to keep going
- *
- * The current implementation is Postgres-specific.  If we ever get around
- * to splitting the regex code out as a standalone library, there will need
- * to be some API to let applications define a callback function for this.
- */
-static int
-rcancelrequested(void)
-{
-	return InterruptPending && (QueryCancelPending || ProcDiePending);
-}
-
 /*
  * rstacktoodeep - check for stack getting dangerously deep
  *
diff --git a/src/backend/regex/rege_dfa.c b/src/backend/regex/rege_dfa.c
index ba1289c64a..1f8f2ab144 100644
--- a/src/backend/regex/rege_dfa.c
+++ b/src/backend/regex/rege_dfa.c
@@ -805,11 +805,7 @@ miss(struct vars *v,
 	 * Checking for operation cancel in the inner text search loop seems
 	 * unduly expensive.  As a compromise, check during cache misses.
 	 */
-	if (CANCEL_REQUESTED(v->re))
-	{
-		ERR(REG_CANCEL);
-		return NULL;
-	}
+	INTERRUPT(v->re);
 
 	/*
 	 * What set of states would we end up in after consuming the co character?
diff --git a/src/backend/regex/regexec.c b/src/backend/regex/regexec.c
index 3d9ff2e607..2a1d5bebda 100644
--- a/src/backend/regex/regexec.c
+++ b/src/backend/regex/regexec.c
@@ -764,8 +764,7 @@ cdissect(struct vars *v,
 	MDEBUG(("%d: cdissect %c %ld-%ld\n", t->id, t->op, LOFF(begin), LOFF(end)));
 
 	/* handy place to check for operation cancel */
-	if (CANCEL_REQUESTED(v->re))
-		return REG_CANCEL;
+	INTERRUPT(v->re);
 	/* ... and stack overrun */
 	if (STACK_TOO_DEEP(v->re))
 		return REG_ETOOBIG;
diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index d34ad6b80d..adc259d5bf 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -553,8 +553,6 @@ makeItemLikeRegex(JsonPathParseItem *expr, JsonPathString *pattern,
 		{
 			char        errMsg[100];
 
-			/* See regexp.c for explanation */
-			CHECK_FOR_INTERRUPTS();
 			pg_regerror(re_result, &re_tmp, errMsg, sizeof(errMsg));
 			ereturn(escontext, false,
 					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 81400ba150..5da3964746 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -218,15 +218,6 @@ RE_compile_and_cache(text *text_re, int cflags, Oid collation)
 	if (regcomp_result != REG_OKAY)
 	{
 		/* re didn't compile (no need for pg_regfree, if so) */
-
-		/*
-		 * Here and in other places in this file, do CHECK_FOR_INTERRUPTS
-		 * before reporting a regex error.  This is so that if the regex
-		 * library aborts and returns REG_CANCEL, we don't print an error
-		 * message that implies the regex was invalid.
-		 */
-		CHECK_FOR_INTERRUPTS();
-
 		pg_regerror(regcomp_result, &re_temp.cre_re, errMsg, sizeof(errMsg));
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
@@ -308,7 +299,6 @@ RE_wchar_execute(regex_t *re, pg_wchar *data, int data_len,
 	if (regexec_result != REG_OKAY && regexec_result != REG_NOMATCH)
 	{
 		/* re failed??? */
-		CHECK_FOR_INTERRUPTS();
 		pg_regerror(regexec_result, re, errMsg, sizeof(errMsg));
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
@@ -2001,7 +1991,6 @@ regexp_fixed_prefix(text *text_re, bool case_insensitive, Oid collation,
 
 		default:
 			/* re failed??? */
-			CHECK_FOR_INTERRUPTS();
 			pg_regerror(re_result, re, errMsg, sizeof(errMsg));
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index f9a607adaf..b571876468 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4265,7 +4265,6 @@ replace_text_regexp(text *src_text, text *pattern_text,
 		{
 			char		errMsg[100];
 
-			CHECK_FOR_INTERRUPTS();
 			pg_regerror(regexec_result, re, errMsg, sizeof(errMsg));
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
diff --git a/src/include/regex/regcustom.h b/src/include/regex/regcustom.h
index 8f4025128e..bedee1e9ca 100644
--- a/src/include/regex/regcustom.h
+++ b/src/include/regex/regcustom.h
@@ -44,7 +44,7 @@
 
 #include "mb/pg_wchar.h"
 
-#include "miscadmin.h"			/* needed by rcancelrequested/rstacktoodeep */
+#include "miscadmin.h"			/* needed by stacktoodeep */
 
 
 /* overrides for regguts.h definitions, if any */
@@ -52,6 +52,7 @@
 #define MALLOC(n)		palloc_extended((n), MCXT_ALLOC_NO_OOM)
 #define FREE(p)			pfree(VS(p))
 #define REALLOC(p,n)	repalloc_extended(VS(p),(n), MCXT_ALLOC_NO_OOM)
+#define INTERRUPT(re)	CHECK_FOR_INTERRUPTS()
 #define assert(x)		Assert(x)
 
 /* internal character type and related */
diff --git a/src/include/regex/regerrs.h b/src/include/regex/regerrs.h
index 41e25f7ff0..2c8873eb81 100644
--- a/src/include/regex/regerrs.h
+++ b/src/include/regex/regerrs.h
@@ -81,7 +81,3 @@
 {
 	REG_ECOLORS, "REG_ECOLORS", "too many colors"
 },
-
-{
-	REG_CANCEL, "REG_CANCEL", "operation cancelled"
-},
diff --git a/src/include/regex/regex.h b/src/include/regex/regex.h
index 1297abec62..d08113724f 100644
--- a/src/include/regex/regex.h
+++ b/src/include/regex/regex.h
@@ -156,7 +156,6 @@ typedef struct
 #define REG_BADOPT	18			/* invalid embedded option */
 #define REG_ETOOBIG 19			/* regular expression is too complex */
 #define REG_ECOLORS 20			/* too many colors */
-#define REG_CANCEL	21			/* operation cancelled */
 /* two specials for debugging and testing */
 #define REG_ATOI	101			/* convert error-code name to number */
 #define REG_ITOA	102			/* convert error-code number to name */
diff --git a/src/include/regex/regguts.h b/src/include/regex/regguts.h
index 91a52840c4..3ca3647e11 100644
--- a/src/include/regex/regguts.h
+++ b/src/include/regex/regguts.h
@@ -77,6 +77,11 @@
 #define FREE(p)		free(VS(p))
 #endif
 
+/* interruption */
+#ifndef INTERRUPT
+#define INTERRUPT(re)
+#endif
+
 /* want size of a char in bits, and max value in bounded quantifiers */
 #ifndef _POSIX2_RE_DUP_MAX
 #define _POSIX2_RE_DUP_MAX	255 /* normally from <limits.h> */
@@ -510,13 +515,9 @@ struct subre
 struct fns
 {
 	void		FUNCPTR(free, (regex_t *));
-	int			FUNCPTR(cancel_requested, (void));
 	int			FUNCPTR(stack_too_deep, (void));
 };
 
-#define CANCEL_REQUESTED(re)  \
-	((*((struct fns *) (re)->re_fns)->cancel_requested) ())
-
 #define STACK_TOO_DEEP(re)	\
 	((*((struct fns *) (re)->re_fns)->stack_too_deep) ())
 
diff --git a/src/test/modules/test_regex/test_regex.c b/src/test/modules/test_regex/test_regex.c
index 1d4f79c9d3..d1dd48a993 100644
--- a/src/test/modules/test_regex/test_regex.c
+++ b/src/test/modules/test_regex/test_regex.c
@@ -185,15 +185,6 @@ test_re_compile(text *text_re, int cflags, Oid collation,
 	if (regcomp_result != REG_OKAY)
 	{
 		/* re didn't compile (no need for pg_regfree, if so) */
-
-		/*
-		 * Here and in other places in this file, do CHECK_FOR_INTERRUPTS
-		 * before reporting a regex error.  This is so that if the regex
-		 * library aborts and returns REG_CANCEL, we don't print an error
-		 * message that implies the regex was invalid.
-		 */
-		CHECK_FOR_INTERRUPTS();
-
 		pg_regerror(regcomp_result, result_re, errMsg, sizeof(errMsg));
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
@@ -239,7 +230,6 @@ test_re_execute(regex_t *re, pg_wchar *data, int data_len,
 	if (regexec_result != REG_OKAY && regexec_result != REG_NOMATCH)
 	{
 		/* re failed??? */
-		CHECK_FOR_INTERRUPTS();
 		pg_regerror(regexec_result, re, errMsg, sizeof(errMsg));
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
-- 
2.39.2

v6-0005-Fix-recovery-conflict-SIGUSR1-handling.patchtext/x-patch; charset=US-ASCII; name=v6-0005-Fix-recovery-conflict-SIGUSR1-handling.patchDownload

From c69e46f609aab169cad3483962188f6f05196602 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 10 May 2022 16:00:23 +1200
Subject: [PATCH v6 5/5] Fix recovery conflict SIGUSR1 handling.

We shouldn't be doing real work in a signal handler, to avoid reaching
code that is not safe in that context.  The previous coding also
confused the 'reason' shown in error messages by clobbering global
variables.  Move all recovery conflict checking logic into the next CFI,
and have the signal handler just set flags and the latch, following the
standard pattern.  Since there are several different reasons, use a
separate flag for each.

With this refactoring, the recovery conflict system no longer
piggy-backs on top of the regular query cancelation mechanisms, but
instead ereports directly if it decides that is necessary.  It still
needs to respect QueryCancelHoldoffCount, because otherwise the FEBE
protocol might be corrupted (see commit 2b3a8b20c2d).

For now we have agreed not to back-patch this change due to its
complexity and the regex changes that it depends on.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/storage/buffer/bufmgr.c  |   4 +-
 src/backend/storage/ipc/procsignal.c |  12 +-
 src/backend/tcop/postgres.c          | 312 ++++++++++++++-------------
 src/include/storage/procsignal.h     |   4 +-
 src/include/tcop/tcopprot.h          |   3 +-
 5 files changed, 172 insertions(+), 163 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 908a8934bd..b902bbc2c6 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4919,8 +4919,8 @@ LockBufferForCleanup(Buffer buffer)
 }
 
 /*
- * Check called from RecoveryConflictInterrupt handler when Startup
- * process requests cancellation of all pin holders that are blocking it.
+ * Check called from ProcessRecoveryConflictInterrupts() when Startup process
+ * requests cancellation of all pin holders that are blocking it.
  */
 bool
 HoldingBufferPinThatDelaysRecovery(void)
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 395b2cf690..e444296f2c 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -662,22 +662,22 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 		HandleParallelApplyMessageInterrupt();
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_TABLESPACE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
 	SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a10ecbaf50..6ec5a01bf4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -161,9 +161,8 @@ static bool EchoQuery = false;	/* -E switch */
 static bool UseSemiNewlineNewline = false;	/* -j switch */
 
 /* whether or not, and why, we were canceled by conflict with recovery */
-static bool RecoveryConflictPending = false;
-static bool RecoveryConflictRetryable = true;
-static ProcSignalReason RecoveryConflictReason;
+static volatile sig_atomic_t RecoveryConflictPending = false;
+static volatile sig_atomic_t RecoveryConflictPendingReasons[NUM_PROCSIGNALS];
 
 /* reused buffer to pass to SendRowDescriptionMessage() */
 static MemoryContext row_description_context = NULL;
@@ -182,7 +181,6 @@ static bool check_log_statement(List *stmt_list);
 static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
-static int	errdetail_recovery_conflict(void);
 static void bind_param_error_callback(void *arg);
 static void start_xact_command(void);
 static void finish_xact_command(void);
@@ -2510,9 +2508,9 @@ errdetail_abort(void)
  * Add an errdetail() line showing conflict source.
  */
 static int
-errdetail_recovery_conflict(void)
+errdetail_recovery_conflict(ProcSignalReason reason)
 {
-	switch (RecoveryConflictReason)
+	switch (reason)
 	{
 		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 			errdetail("User was holding shared buffer pin for too long.");
@@ -3037,137 +3035,190 @@ FloatExceptionHandler(SIGNAL_ARGS)
 }
 
 /*
- * RecoveryConflictInterrupt: out-of-line portion of recovery conflict
- * handling following receipt of SIGUSR1. Designed to be similar to die()
- * and StatementCancelHandler(). Called only by a normal user backend
- * that begins a transaction during recovery.
+ * Tell the next CHECK_FOR_INTERRUPTS() to check for a particular type of
+ * recovery conflict.  Runs in a SIGUSR1 handler.
  */
 void
-RecoveryConflictInterrupt(ProcSignalReason reason)
+HandleRecoveryConflictInterrupt(ProcSignalReason reason)
 {
-	int			save_errno = errno;
+	RecoveryConflictPendingReasons[reason] = true;
+	RecoveryConflictPending = true;
+	InterruptPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
 
-	/*
-	 * Don't joggle the elbow of proc_exit
-	 */
-	if (!proc_exit_inprogress)
+/*
+ * Check one individual conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupt(ProcSignalReason reason)
+{
+	switch (reason)
 	{
-		RecoveryConflictReason = reason;
-		switch (reason)
-		{
-			case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
+		case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
 
-				/*
-				 * If we aren't waiting for a lock we can never deadlock.
-				 */
-				if (!IsWaitingForLock())
-					return;
+			/*
+			 * If we aren't waiting for a lock we can never deadlock.
+			 */
+			if (!IsWaitingForLock())
+				return;
 
-				/* Intentional fall through to check wait for pin */
-				/* FALLTHROUGH */
+			/* Intentional fall through to check wait for pin */
+			/* FALLTHROUGH */
 
-			case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
+		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 
-				/*
-				 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
-				 * aren't blocking the Startup process there is nothing more
-				 * to do.
-				 *
-				 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is
-				 * requested, if we're waiting for locks and the startup
-				 * process is not waiting for buffer pin (i.e., also waiting
-				 * for locks), we set the flag so that ProcSleep() will check
-				 * for deadlocks.
-				 */
-				if (!HoldingBufferPinThatDelaysRecovery())
-				{
-					if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
-						GetStartupBufferPinWaitBufId() < 0)
-						CheckDeadLockAlert();
-					return;
-				}
+			/*
+			 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
+			 * aren't blocking the Startup process there is nothing more to
+			 * do.
+			 *
+			 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is requested,
+			 * if we're waiting for locks and the startup process is not
+			 * waiting for buffer pin (i.e., also waiting for locks), we set
+			 * the flag so that ProcSleep() will check for deadlocks.
+			 */
+			if (!HoldingBufferPinThatDelaysRecovery())
+			{
+				if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
+					GetStartupBufferPinWaitBufId() < 0)
+					CheckDeadLockAlert();
+				return;
+			}
 
-				MyProc->recoveryConflictPending = true;
+			MyProc->recoveryConflictPending = true;
 
-				/* Intentional fall through to error handling */
-				/* FALLTHROUGH */
+			/* Intentional fall through to error handling */
+			/* FALLTHROUGH */
+
+		case PROCSIG_RECOVERY_CONFLICT_LOCK:
+		case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
+		case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
 
-			case PROCSIG_RECOVERY_CONFLICT_LOCK:
-			case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
-			case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+			/*
+			 * If we aren't in a transaction any longer then ignore.
+			 */
+			if (!IsTransactionOrTransactionBlock())
+				return;
 
+			/*
+			 * If we're not in a subtransaction then we are OK to throw an
+			 * ERROR to resolve the conflict.  Otherwise drop through to the
+			 * FATAL case.
+			 *
+			 * XXX other times that we can throw just an ERROR *may* be
+			 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in parent
+			 * transactions
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held by
+			 * parent transactions and the transaction is not
+			 * transaction-snapshot mode
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
+			 * cursors open in parent transactions
+			 */
+			if (!IsSubTransaction())
+			{
 				/*
-				 * If we aren't in a transaction any longer then ignore.
+				 * If we already aborted then we no longer need to cancel.  We
+				 * do this here since we do not wish to ignore aborted
+				 * subtransactions, which must cause FATAL, currently.
 				 */
-				if (!IsTransactionOrTransactionBlock())
+				if (IsAbortedTransactionBlockState())
 					return;
 
 				/*
-				 * If we can abort just the current subtransaction then we are
-				 * OK to throw an ERROR to resolve the conflict. Otherwise
-				 * drop through to the FATAL case.
-				 *
-				 * XXX other times that we can throw just an ERROR *may* be
-				 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in
-				 * parent transactions
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held
-				 * by parent transactions and the transaction is not
-				 * transaction-snapshot mode
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
-				 * cursors open in parent transactions
+				 * If a recovery conflict happens while we are waiting for
+				 * input from the client, the client is presumably just
+				 * sitting idle in a transaction, preventing recovery from
+				 * making progress.  We'll drop through to the FATAL case
+				 * below to dislodge it, in that case.
 				 */
-				if (!IsSubTransaction())
+				if (!DoingCommandRead)
 				{
-					/*
-					 * If we already aborted then we no longer need to cancel.
-					 * We do this here since we do not wish to ignore aborted
-					 * subtransactions, which must cause FATAL, currently.
-					 */
-					if (IsAbortedTransactionBlockState())
+					/* Avoid losing sync in the FE/BE protocol. */
+					if (QueryCancelHoldoffCount != 0)
+					{
+						/*
+						 * Re-arm and defer this interrupt until later.  See
+						 * similar code in ProcessInterrupts().
+						 */
+						RecoveryConflictPendingReasons[reason] = true;
+						RecoveryConflictPending = true;
+						InterruptPending = true;
 						return;
+					}
 
-					RecoveryConflictPending = true;
-					QueryCancelPending = true;
-					InterruptPending = true;
+					/*
+					 * We are cleared to throw an ERROR.  We have a top-level
+					 * transaction that we can abort and a conflict that isn't
+					 * inherently non-retryable.
+					 */
+					LockErrorCleanup();
+					pgstat_report_recovery_conflict(reason);
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("canceling statement due to conflict with recovery"),
+							 errdetail_recovery_conflict(reason)));
 					break;
 				}
+			}
 
-				/* Intentional fall through to session cancel */
-				/* FALLTHROUGH */
-
-			case PROCSIG_RECOVERY_CONFLICT_DATABASE:
-				RecoveryConflictPending = true;
-				ProcDiePending = true;
-				InterruptPending = true;
-				break;
+			/* Intentional fall through to session cancel */
+			/* FALLTHROUGH */
 
-			default:
-				elog(FATAL, "unrecognized conflict mode: %d",
-					 (int) reason);
-		}
+		case PROCSIG_RECOVERY_CONFLICT_DATABASE:
 
-		Assert(RecoveryConflictPending && (QueryCancelPending || ProcDiePending));
+			/*
+			 * Retrying is not possible because the database is dropped, or we
+			 * decided above that we couldn't resolve the conflict with an
+			 * ERROR and fell through.  Terminate the session.
+			 */
+			pgstat_report_recovery_conflict(reason);
+			ereport(FATAL,
+					(errcode(reason == PROCSIG_RECOVERY_CONFLICT_DATABASE ?
+							 ERRCODE_DATABASE_DROPPED :
+							 ERRCODE_T_R_SERIALIZATION_FAILURE),
+					 errmsg("terminating connection due to conflict with recovery"),
+					 errdetail_recovery_conflict(reason),
+					 errhint("In a moment you should be able to reconnect to the"
+							 " database and repeat your command.")));
+			break;
 
-		/*
-		 * All conflicts apart from database cause dynamic errors where the
-		 * command or transaction can be retried at a later point with some
-		 * potential for success. No need to reset this, since non-retryable
-		 * conflict errors are currently FATAL.
-		 */
-		if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
-			RecoveryConflictRetryable = false;
+		default:
+			elog(FATAL, "unrecognized conflict mode: %d", (int) reason);
 	}
+}
+
+/*
+ * Check each possible recovery conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupts(void)
+{
+	ProcSignalReason reason;
 
 	/*
-	 * Set the process latch. This function essentially emulates signal
-	 * handlers like die() and StatementCancelHandler() and it seems prudent
-	 * to behave similarly as they do.
+	 * We don't need to worry about joggling the elbow of proc_exit, because
+	 * proc_exit_prepare() holds interrupts, so ProcessInterrupts() won't call
+	 * us.
 	 */
-	SetLatch(MyLatch);
+	Assert(!proc_exit_inprogress);
+	Assert(InterruptHoldoffCount == 0);
+	Assert(RecoveryConflictPending);
 
-	errno = save_errno;
+	RecoveryConflictPending = false;
+
+	for (reason = PROCSIG_RECOVERY_CONFLICT_FIRST;
+		 reason <= PROCSIG_RECOVERY_CONFLICT_LAST;
+		 reason++)
+	{
+		if (RecoveryConflictPendingReasons[reason])
+		{
+			RecoveryConflictPendingReasons[reason] = false;
+			ProcessRecoveryConflictInterrupt(reason);
+		}
+	}
 }
 
 /*
@@ -3222,24 +3273,6 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
-		else if (RecoveryConflictPending && RecoveryConflictRetryable)
-		{
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
-		else if (RecoveryConflictPending)
-		{
-			/* Currently there is only one non-retryable recovery conflict */
-			Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_DATABASE_DROPPED),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 		else if (IsBackgroundWorker)
 			ereport(FATAL,
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
@@ -3282,31 +3315,13 @@ ProcessInterrupts(void)
 				 errmsg("connection to client lost")));
 	}
 
-	/*
-	 * If a recovery conflict happens while we are waiting for input from the
-	 * client, the client is presumably just sitting idle in a transaction,
-	 * preventing recovery from making progress.  Terminate the connection to
-	 * dislodge it.
-	 */
-	if (RecoveryConflictPending && DoingCommandRead)
-	{
-		QueryCancelPending = false; /* this trumps QueryCancel */
-		RecoveryConflictPending = false;
-		LockErrorCleanup();
-		pgstat_report_recovery_conflict(RecoveryConflictReason);
-		ereport(FATAL,
-				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-				 errmsg("terminating connection due to conflict with recovery"),
-				 errdetail_recovery_conflict(),
-				 errhint("In a moment you should be able to reconnect to the"
-						 " database and repeat your command.")));
-	}
-
 	/*
 	 * Don't allow query cancel interrupts while reading input from the
 	 * client, because we might lose sync in the FE/BE protocol.  (Die
 	 * interrupts are OK, because we won't read any further messages from the
 	 * client in that case.)
+	 *
+	 * See similar logic in ProcessRecoveryConflictInterrupts().
 	 */
 	if (QueryCancelPending && QueryCancelHoldoffCount != 0)
 	{
@@ -3365,16 +3380,6 @@ ProcessInterrupts(void)
 					(errcode(ERRCODE_QUERY_CANCELED),
 					 errmsg("canceling autovacuum task")));
 		}
-		if (RecoveryConflictPending)
-		{
-			RecoveryConflictPending = false;
-			LockErrorCleanup();
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(ERROR,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("canceling statement due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 
 		/*
 		 * If we are reading a command from the client, just ignore the cancel
@@ -3390,6 +3395,9 @@ ProcessInterrupts(void)
 		}
 	}
 
+	if (RecoveryConflictPending)
+		ProcessRecoveryConflictInterrupts();
+
 	if (IdleInTransactionSessionTimeoutPending)
 	{
 		/*
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 905af2231b..6ef7298294 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -38,12 +38,14 @@ typedef enum
 	PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
 
 	/* Recovery conflict reasons */
-	PROCSIG_RECOVERY_CONFLICT_DATABASE,
+	PROCSIG_RECOVERY_CONFLICT_FIRST,
+	PROCSIG_RECOVERY_CONFLICT_DATABASE = PROCSIG_RECOVERY_CONFLICT_FIRST,
 	PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
 	PROCSIG_RECOVERY_CONFLICT_LOCK,
 	PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+	PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index abd7b4fff3..ab43b638ee 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -70,8 +70,7 @@ extern void die(SIGNAL_ARGS);
 extern void quickdie(SIGNAL_ARGS) pg_attribute_noreturn();
 extern void StatementCancelHandler(SIGNAL_ARGS);
 extern void FloatExceptionHandler(SIGNAL_ARGS) pg_attribute_noreturn();
-extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
-																 * handler */
+extern void HandleRecoveryConflictInterrupt(ProcSignalReason reason);
 extern void ProcessClientReadInterrupt(bool blocked);
 extern void ProcessClientWriteInterrupt(bool blocked);
 
-- 
2.39.2

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Thomas Munro (#37)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Sat, Apr 08, 2023 at 01:32:22AM +1200, Thomas Munro wrote:

I'm hoping to get just the regex changes in ASAP, and then take a
little bit longer on the recovery conflict patch itself (v6-0005) on
the basis that it's bugfix work and not subject to the feature freeze.

Agreed. It would be good to check with the RMT, but as long as that's
not at the middle/end of the beta cycle I guess that's OK for this
one, even if it is only for HEAD.
--
Michael

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Michael Paquier (#38)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Michael Paquier <michael@paquier.xyz> writes:

On Sat, Apr 08, 2023 at 01:32:22AM +1200, Thomas Munro wrote:

I'm hoping to get just the regex changes in ASAP, and then take a
little bit longer on the recovery conflict patch itself (v6-0005) on
the basis that it's bugfix work and not subject to the feature freeze.

Agreed. It would be good to check with the RMT, but as long as that's
not at the middle/end of the beta cycle I guess that's OK for this
one, even if it is only for HEAD.

Right. regex changes pass an eyeball check here.

regards, tom lane

thomas.munro@gmail.com

over 2 years ago

In reply to: Tom Lane (#39)

1 attachment(s)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

Here is a rebase over 26669757, which introduced
PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT.

I got a bit confused about why this new conflict reason didn't follow
the usual ERROR->FATAL promotion rules and pinged Andres who provided:
"Logical decoding slots are only acquired while performing logical
decoding. During logical decoding no user controlled code is run.
During [sub]transaction abort, the slot is released. Therefore user
controlled code cannot intercept an error before the replication slot
is released." That's included in a comment in the attached to explain
the special treatment.

Attachments:

v7-0001-Fix-recovery-conflict-SIGUSR1-handling.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Fix-recovery-conflict-SIGUSR1-handling.patchDownload

From df200ac813ce75f07cb98c4b9144a5d82b535efc Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 10 May 2022 16:00:23 +1200
Subject: [PATCH v7] Fix recovery conflict SIGUSR1 handling.

We shouldn't be doing real work in a signal handler, to avoid reaching
code that is not safe in that context.  The previous coding also
confused the 'reason' shown in error messages by clobbering global
variables.  Move all recovery conflict checking logic into the next CFI,
and have the signal handler just set flags and the latch, following the
standard pattern.  Since there are several different reasons, use a
separate flag for each.

With this refactoring, the recovery conflict system no longer
piggy-backs on top of the regular query cancelation mechanisms, but
instead ereports directly if it decides that is necessary.  It still
needs to respect QueryCancelHoldoffCount, because otherwise the FEBE
protocol might be corrupted (see commit 2b3a8b20c2d).

Back-patch to 16.  For now we have agreed not to back-patch this change
any further than that, due to its complexity and the regex changes in
commit bea3d7e that it depends on.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/storage/buffer/bufmgr.c  |   4 +-
 src/backend/storage/ipc/procsignal.c |  14 +-
 src/backend/tcop/postgres.c          | 333 ++++++++++++++-------------
 src/include/storage/procsignal.h     |   4 +-
 src/include/tcop/tcopprot.h          |   3 +-
 5 files changed, 187 insertions(+), 171 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index df22aaa1c5..55c484a43e 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4923,8 +4923,8 @@ LockBufferForCleanup(Buffer buffer)
 }
 
 /*
- * Check called from RecoveryConflictInterrupt handler when Startup
- * process requests cancellation of all pin holders that are blocking it.
+ * Check called from ProcessRecoveryConflictInterrupts() when Startup process
+ * requests cancellation of all pin holders that are blocking it.
  */
 bool
 HoldingBufferPinThatDelaysRecovery(void)
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c85cb5cc18..b7427906de 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -662,25 +662,25 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 		HandleParallelApplyMessageInterrupt();
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_TABLESPACE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
 	SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 36cc99ec9c..24f326e99d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -161,9 +161,8 @@ static bool EchoQuery = false;	/* -E switch */
 static bool UseSemiNewlineNewline = false;	/* -j switch */
 
 /* whether or not, and why, we were canceled by conflict with recovery */
-static bool RecoveryConflictPending = false;
-static bool RecoveryConflictRetryable = true;
-static ProcSignalReason RecoveryConflictReason;
+static volatile sig_atomic_t RecoveryConflictPending = false;
+static volatile sig_atomic_t RecoveryConflictPendingReasons[NUM_PROCSIGNALS];
 
 /* reused buffer to pass to SendRowDescriptionMessage() */
 static MemoryContext row_description_context = NULL;
@@ -182,7 +181,6 @@ static bool check_log_statement(List *stmt_list);
 static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
-static int	errdetail_recovery_conflict(void);
 static void bind_param_error_callback(void *arg);
 static void start_xact_command(void);
 static void finish_xact_command(void);
@@ -2510,9 +2508,9 @@ errdetail_abort(void)
  * Add an errdetail() line showing conflict source.
  */
 static int
-errdetail_recovery_conflict(void)
+errdetail_recovery_conflict(ProcSignalReason reason)
 {
-	switch (RecoveryConflictReason)
+	switch (reason)
 	{
 		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 			errdetail("User was holding shared buffer pin for too long.");
@@ -3040,143 +3038,203 @@ FloatExceptionHandler(SIGNAL_ARGS)
 }
 
 /*
- * RecoveryConflictInterrupt: out-of-line portion of recovery conflict
- * handling following receipt of SIGUSR1. Designed to be similar to die()
- * and StatementCancelHandler(). Called only by a normal user backend
- * that begins a transaction during recovery.
+ * Tell the next CHECK_FOR_INTERRUPTS() to check for a particular type of
+ * recovery conflict.  Runs in a SIGUSR1 handler.
  */
 void
-RecoveryConflictInterrupt(ProcSignalReason reason)
+HandleRecoveryConflictInterrupt(ProcSignalReason reason)
 {
-	int			save_errno = errno;
+	RecoveryConflictPendingReasons[reason] = true;
+	RecoveryConflictPending = true;
+	InterruptPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
 
-	/*
-	 * Don't joggle the elbow of proc_exit
-	 */
-	if (!proc_exit_inprogress)
+/*
+ * Check one individual conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupt(ProcSignalReason reason)
+{
+	switch (reason)
 	{
-		RecoveryConflictReason = reason;
-		switch (reason)
-		{
-			case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
+		case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
 
-				/*
-				 * If we aren't waiting for a lock we can never deadlock.
-				 */
-				if (!IsWaitingForLock())
-					return;
+			/*
+			 * If we aren't waiting for a lock we can never deadlock.
+			 */
+			if (!IsWaitingForLock())
+				return;
 
-				/* Intentional fall through to check wait for pin */
-				/* FALLTHROUGH */
+			/* Intentional fall through to check wait for pin */
+			/* FALLTHROUGH */
 
-			case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
+		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 
-				/*
-				 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
-				 * aren't blocking the Startup process there is nothing more
-				 * to do.
-				 *
-				 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is
-				 * requested, if we're waiting for locks and the startup
-				 * process is not waiting for buffer pin (i.e., also waiting
-				 * for locks), we set the flag so that ProcSleep() will check
-				 * for deadlocks.
-				 */
-				if (!HoldingBufferPinThatDelaysRecovery())
-				{
-					if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
-						GetStartupBufferPinWaitBufId() < 0)
-						CheckDeadLockAlert();
-					return;
-				}
+			/*
+			 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
+			 * aren't blocking the Startup process there is nothing more to
+			 * do.
+			 *
+			 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is requested,
+			 * if we're waiting for locks and the startup process is not
+			 * waiting for buffer pin (i.e., also waiting for locks), we set
+			 * the flag so that ProcSleep() will check for deadlocks.
+			 */
+			if (!HoldingBufferPinThatDelaysRecovery())
+			{
+				if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
+					GetStartupBufferPinWaitBufId() < 0)
+					CheckDeadLockAlert();
+				return;
+			}
 
-				MyProc->recoveryConflictPending = true;
+			MyProc->recoveryConflictPending = true;
 
-				/* Intentional fall through to error handling */
-				/* FALLTHROUGH */
+			/* Intentional fall through to error handling */
+			/* FALLTHROUGH */
 
-			case PROCSIG_RECOVERY_CONFLICT_LOCK:
-			case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
-			case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+		case PROCSIG_RECOVERY_CONFLICT_LOCK:
+		case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
+		case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
 
+			/*
+			 * If we aren't in a transaction any longer then ignore.
+			 */
+			if (!IsTransactionOrTransactionBlock())
+				return;
+
+			/* FALLTHROUGH */
+
+		case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+
+			/*
+			 * If we're not in a subtransaction then we are OK to throw an
+			 * ERROR to resolve the conflict.  Otherwise drop through to the
+			 * FATAL case.
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is a special case that
+			 * always throws an ERROR (ie never promotes to FATAL), though it
+			 * still has to respect QueryCancelHoldoffCount, so it shares this
+			 * code path.  Logical decoding slots are only acquired while
+			 * performing logical decoding.  During logical decoding no user
+			 * controlled code is run.  During [sub]transaction abort, the
+			 * slot is released.  Therefore user controlled code cannot
+			 * intercept an error before the replication slot is released.
+			 *
+			 * XXX other times that we can throw just an ERROR *may* be
+			 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in parent
+			 * transactions
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held by
+			 * parent transactions and the transaction is not
+			 * transaction-snapshot mode
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
+			 * cursors open in parent transactions
+			 */
+			if (reason == PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT ||
+				!IsSubTransaction())
+			{
 				/*
-				 * If we aren't in a transaction any longer then ignore.
+				 * If we already aborted then we no longer need to cancel.  We
+				 * do this here since we do not wish to ignore aborted
+				 * subtransactions, which must cause FATAL, currently.
 				 */
-				if (!IsTransactionOrTransactionBlock())
+				if (IsAbortedTransactionBlockState())
 					return;
 
 				/*
-				 * If we can abort just the current subtransaction then we are
-				 * OK to throw an ERROR to resolve the conflict. Otherwise
-				 * drop through to the FATAL case.
-				 *
-				 * XXX other times that we can throw just an ERROR *may* be
-				 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in
-				 * parent transactions
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held
-				 * by parent transactions and the transaction is not
-				 * transaction-snapshot mode
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
-				 * cursors open in parent transactions
+				 * If a recovery conflict happens while we are waiting for
+				 * input from the client, the client is presumably just
+				 * sitting idle in a transaction, preventing recovery from
+				 * making progress.  We'll drop through to the FATAL case
+				 * below to dislodge it, in that case.
 				 */
-				if (!IsSubTransaction())
+				if (!DoingCommandRead)
 				{
-					/*
-					 * If we already aborted then we no longer need to cancel.
-					 * We do this here since we do not wish to ignore aborted
-					 * subtransactions, which must cause FATAL, currently.
-					 */
-					if (IsAbortedTransactionBlockState())
+					/* Avoid losing sync in the FE/BE protocol. */
+					if (QueryCancelHoldoffCount != 0)
+					{
+						/*
+						 * Re-arm and defer this interrupt until later.  See
+						 * similar code in ProcessInterrupts().
+						 */
+						RecoveryConflictPendingReasons[reason] = true;
+						RecoveryConflictPending = true;
+						InterruptPending = true;
 						return;
+					}
 
-					RecoveryConflictPending = true;
-					QueryCancelPending = true;
-					InterruptPending = true;
+					/*
+					 * We are cleared to throw an ERROR.  Either it's the
+					 * logical slot case, or we have a top-level transaction
+					 * that we can abort and a conflict that isn't inherently
+					 * non-retryable.
+					 */
+					LockErrorCleanup();
+					pgstat_report_recovery_conflict(reason);
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("canceling statement due to conflict with recovery"),
+							 errdetail_recovery_conflict(reason)));
 					break;
 				}
+			}
 
-				/* Intentional fall through to session cancel */
-				/* FALLTHROUGH */
-
-			case PROCSIG_RECOVERY_CONFLICT_DATABASE:
-				RecoveryConflictPending = true;
-				ProcDiePending = true;
-				InterruptPending = true;
-				break;
-
-			case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
-				RecoveryConflictPending = true;
-				QueryCancelPending = true;
-				InterruptPending = true;
-				break;
+			/* Intentional fall through to session cancel */
+			/* FALLTHROUGH */
 
-			default:
-				elog(FATAL, "unrecognized conflict mode: %d",
-					 (int) reason);
-		}
-
-		Assert(RecoveryConflictPending && (QueryCancelPending || ProcDiePending));
+			/*
+			 * Retrying is not possible because the database is dropped, or we
+			 * decided above that we couldn't resolve the conflict with an
+			 * ERROR and fell through.  Terminate the session.
+			 */
+			pgstat_report_recovery_conflict(reason);
+			ereport(FATAL,
+					(errcode(reason == PROCSIG_RECOVERY_CONFLICT_DATABASE ?
+							 ERRCODE_DATABASE_DROPPED :
+							 ERRCODE_T_R_SERIALIZATION_FAILURE),
+					 errmsg("terminating connection due to conflict with recovery"),
+					 errdetail_recovery_conflict(reason),
+					 errhint("In a moment you should be able to reconnect to the"
+							 " database and repeat your command.")));
+			break;
 
-		/*
-		 * All conflicts apart from database cause dynamic errors where the
-		 * command or transaction can be retried at a later point with some
-		 * potential for success. No need to reset this, since non-retryable
-		 * conflict errors are currently FATAL.
-		 */
-		if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
-			RecoveryConflictRetryable = false;
+		default:
+			elog(FATAL, "unrecognized conflict mode: %d", (int) reason);
 	}
+}
+
+/*
+ * Check each possible recovery conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupts(void)
+{
+	ProcSignalReason reason;
 
 	/*
-	 * Set the process latch. This function essentially emulates signal
-	 * handlers like die() and StatementCancelHandler() and it seems prudent
-	 * to behave similarly as they do.
+	 * We don't need to worry about joggling the elbow of proc_exit, because
+	 * proc_exit_prepare() holds interrupts, so ProcessInterrupts() won't call
+	 * us.
 	 */
-	SetLatch(MyLatch);
+	Assert(!proc_exit_inprogress);
+	Assert(InterruptHoldoffCount == 0);
+	Assert(RecoveryConflictPending);
 
-	errno = save_errno;
+	RecoveryConflictPending = false;
+
+	for (reason = PROCSIG_RECOVERY_CONFLICT_FIRST;
+		 reason <= PROCSIG_RECOVERY_CONFLICT_LAST;
+		 reason++)
+	{
+		if (RecoveryConflictPendingReasons[reason])
+		{
+			RecoveryConflictPendingReasons[reason] = false;
+			ProcessRecoveryConflictInterrupt(reason);
+		}
+	}
 }
 
 /*
@@ -3231,24 +3289,6 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
-		else if (RecoveryConflictPending && RecoveryConflictRetryable)
-		{
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
-		else if (RecoveryConflictPending)
-		{
-			/* Currently there is only one non-retryable recovery conflict */
-			Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_DATABASE_DROPPED),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 		else if (IsBackgroundWorker)
 			ereport(FATAL,
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
@@ -3291,31 +3331,13 @@ ProcessInterrupts(void)
 				 errmsg("connection to client lost")));
 	}
 
-	/*
-	 * If a recovery conflict happens while we are waiting for input from the
-	 * client, the client is presumably just sitting idle in a transaction,
-	 * preventing recovery from making progress.  Terminate the connection to
-	 * dislodge it.
-	 */
-	if (RecoveryConflictPending && DoingCommandRead)
-	{
-		QueryCancelPending = false; /* this trumps QueryCancel */
-		RecoveryConflictPending = false;
-		LockErrorCleanup();
-		pgstat_report_recovery_conflict(RecoveryConflictReason);
-		ereport(FATAL,
-				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-				 errmsg("terminating connection due to conflict with recovery"),
-				 errdetail_recovery_conflict(),
-				 errhint("In a moment you should be able to reconnect to the"
-						 " database and repeat your command.")));
-	}
-
 	/*
 	 * Don't allow query cancel interrupts while reading input from the
 	 * client, because we might lose sync in the FE/BE protocol.  (Die
 	 * interrupts are OK, because we won't read any further messages from the
 	 * client in that case.)
+	 *
+	 * See similar logic in ProcessRecoveryConflictInterrupts().
 	 */
 	if (QueryCancelPending && QueryCancelHoldoffCount != 0)
 	{
@@ -3374,16 +3396,6 @@ ProcessInterrupts(void)
 					(errcode(ERRCODE_QUERY_CANCELED),
 					 errmsg("canceling autovacuum task")));
 		}
-		if (RecoveryConflictPending)
-		{
-			RecoveryConflictPending = false;
-			LockErrorCleanup();
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(ERROR,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("canceling statement due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 
 		/*
 		 * If we are reading a command from the client, just ignore the cancel
@@ -3399,6 +3411,9 @@ ProcessInterrupts(void)
 		}
 	}
 
+	if (RecoveryConflictPending)
+		ProcessRecoveryConflictInterrupts();
+
 	if (IdleInTransactionSessionTimeoutPending)
 	{
 		/*
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 2f52100b00..3a3a7eca77 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -38,13 +38,15 @@ typedef enum
 	PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
 
 	/* Recovery conflict reasons */
-	PROCSIG_RECOVERY_CONFLICT_DATABASE,
+	PROCSIG_RECOVERY_CONFLICT_FIRST,
+	PROCSIG_RECOVERY_CONFLICT_DATABASE = PROCSIG_RECOVERY_CONFLICT_FIRST,
 	PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
 	PROCSIG_RECOVERY_CONFLICT_LOCK,
 	PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
 	PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+	PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index abd7b4fff3..ab43b638ee 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -70,8 +70,7 @@ extern void die(SIGNAL_ARGS);
 extern void quickdie(SIGNAL_ARGS) pg_attribute_noreturn();
 extern void StatementCancelHandler(SIGNAL_ARGS);
 extern void FloatExceptionHandler(SIGNAL_ARGS) pg_attribute_noreturn();
-extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
-																 * handler */
+extern void HandleRecoveryConflictInterrupt(ProcSignalReason reason);
 extern void ProcessClientReadInterrupt(bool blocked);
 extern void ProcessClientWriteInterrupt(bool blocked);
 
-- 
2.41.0

thomas.munro@gmail.com

over 2 years ago

In reply to: Thomas Munro (#40)

1 attachment(s)

Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

On Sat, Aug 5, 2023 at 1:39 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Here is a rebase over 26669757, which introduced
PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT.

Oops, please disregard v7 (somehow lost a precious line of code). V8 is better.

Attachments:

v8-0001-Fix-recovery-conflict-SIGUSR1-handling.patchtext/x-patch; charset=US-ASCII; name=v8-0001-Fix-recovery-conflict-SIGUSR1-handling.patchDownload

From 6544931e533aa015f39215f9c9d2df3e06700a96 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 10 May 2022 16:00:23 +1200
Subject: [PATCH v8] Fix recovery conflict SIGUSR1 handling.

We shouldn't be doing real work in a signal handler, to avoid reaching
code that is not safe in that context.  The previous coding also
confused the 'reason' shown in error messages by clobbering global
variables.  Move all recovery conflict checking logic into the next CFI,
and have the signal handler just set flags and the latch, following the
standard pattern.  Since there are several different reasons, use a
separate flag for each.

With this refactoring, the recovery conflict system no longer
piggy-backs on top of the regular query cancelation mechanisms, but
instead ereports directly if it decides that is necessary.  It still
needs to respect QueryCancelHoldoffCount, because otherwise the FEBE
protocol might be corrupted (see commit 2b3a8b20c2d).

Back-patch to 16.  For now we have agreed not to back-patch this change
any further than that, due to its complexity and the regex changes in
commit bea3d7e that it depends on.

Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKGK3PGKwcKqzoosamn36YW-fsuTdOPPF1i_rtEO%3DnEYKSg%40mail.gmail.com
---
 src/backend/storage/buffer/bufmgr.c  |   4 +-
 src/backend/storage/ipc/procsignal.c |  14 +-
 src/backend/tcop/postgres.c          | 333 ++++++++++++++-------------
 src/include/storage/procsignal.h     |   4 +-
 src/include/tcop/tcopprot.h          |   3 +-
 5 files changed, 188 insertions(+), 170 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index df22aaa1c5..55c484a43e 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4923,8 +4923,8 @@ LockBufferForCleanup(Buffer buffer)
 }
 
 /*
- * Check called from RecoveryConflictInterrupt handler when Startup
- * process requests cancellation of all pin holders that are blocking it.
+ * Check called from ProcessRecoveryConflictInterrupts() when Startup process
+ * requests cancellation of all pin holders that are blocking it.
  */
 bool
 HoldingBufferPinThatDelaysRecovery(void)
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c85cb5cc18..b7427906de 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -662,25 +662,25 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 		HandleParallelApplyMessageInterrupt();
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_TABLESPACE))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
 
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
-		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+		HandleRecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
 	SetLatch(MyLatch);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 36cc99ec9c..fab976227f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -161,9 +161,8 @@ static bool EchoQuery = false;	/* -E switch */
 static bool UseSemiNewlineNewline = false;	/* -j switch */
 
 /* whether or not, and why, we were canceled by conflict with recovery */
-static bool RecoveryConflictPending = false;
-static bool RecoveryConflictRetryable = true;
-static ProcSignalReason RecoveryConflictReason;
+static volatile sig_atomic_t RecoveryConflictPending = false;
+static volatile sig_atomic_t RecoveryConflictPendingReasons[NUM_PROCSIGNALS];
 
 /* reused buffer to pass to SendRowDescriptionMessage() */
 static MemoryContext row_description_context = NULL;
@@ -182,7 +181,6 @@ static bool check_log_statement(List *stmt_list);
 static int	errdetail_execute(List *raw_parsetree_list);
 static int	errdetail_params(ParamListInfo params);
 static int	errdetail_abort(void);
-static int	errdetail_recovery_conflict(void);
 static void bind_param_error_callback(void *arg);
 static void start_xact_command(void);
 static void finish_xact_command(void);
@@ -2510,9 +2508,9 @@ errdetail_abort(void)
  * Add an errdetail() line showing conflict source.
  */
 static int
-errdetail_recovery_conflict(void)
+errdetail_recovery_conflict(ProcSignalReason reason)
 {
-	switch (RecoveryConflictReason)
+	switch (reason)
 	{
 		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 			errdetail("User was holding shared buffer pin for too long.");
@@ -3040,143 +3038,205 @@ FloatExceptionHandler(SIGNAL_ARGS)
 }
 
 /*
- * RecoveryConflictInterrupt: out-of-line portion of recovery conflict
- * handling following receipt of SIGUSR1. Designed to be similar to die()
- * and StatementCancelHandler(). Called only by a normal user backend
- * that begins a transaction during recovery.
+ * Tell the next CHECK_FOR_INTERRUPTS() to check for a particular type of
+ * recovery conflict.  Runs in a SIGUSR1 handler.
  */
 void
-RecoveryConflictInterrupt(ProcSignalReason reason)
+HandleRecoveryConflictInterrupt(ProcSignalReason reason)
 {
-	int			save_errno = errno;
+	RecoveryConflictPendingReasons[reason] = true;
+	RecoveryConflictPending = true;
+	InterruptPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
 
-	/*
-	 * Don't joggle the elbow of proc_exit
-	 */
-	if (!proc_exit_inprogress)
+/*
+ * Check one individual conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupt(ProcSignalReason reason)
+{
+	switch (reason)
 	{
-		RecoveryConflictReason = reason;
-		switch (reason)
-		{
-			case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
+		case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK:
 
-				/*
-				 * If we aren't waiting for a lock we can never deadlock.
-				 */
-				if (!IsWaitingForLock())
-					return;
+			/*
+			 * If we aren't waiting for a lock we can never deadlock.
+			 */
+			if (!IsWaitingForLock())
+				return;
 
-				/* Intentional fall through to check wait for pin */
-				/* FALLTHROUGH */
+			/* Intentional fall through to check wait for pin */
+			/* FALLTHROUGH */
 
-			case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
+		case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 
-				/*
-				 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
-				 * aren't blocking the Startup process there is nothing more
-				 * to do.
-				 *
-				 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is
-				 * requested, if we're waiting for locks and the startup
-				 * process is not waiting for buffer pin (i.e., also waiting
-				 * for locks), we set the flag so that ProcSleep() will check
-				 * for deadlocks.
-				 */
-				if (!HoldingBufferPinThatDelaysRecovery())
-				{
-					if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
-						GetStartupBufferPinWaitBufId() < 0)
-						CheckDeadLockAlert();
-					return;
-				}
+			/*
+			 * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is requested but we
+			 * aren't blocking the Startup process there is nothing more to
+			 * do.
+			 *
+			 * When PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is requested,
+			 * if we're waiting for locks and the startup process is not
+			 * waiting for buffer pin (i.e., also waiting for locks), we set
+			 * the flag so that ProcSleep() will check for deadlocks.
+			 */
+			if (!HoldingBufferPinThatDelaysRecovery())
+			{
+				if (reason == PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
+					GetStartupBufferPinWaitBufId() < 0)
+					CheckDeadLockAlert();
+				return;
+			}
 
-				MyProc->recoveryConflictPending = true;
+			MyProc->recoveryConflictPending = true;
 
-				/* Intentional fall through to error handling */
-				/* FALLTHROUGH */
+			/* Intentional fall through to error handling */
+			/* FALLTHROUGH */
 
-			case PROCSIG_RECOVERY_CONFLICT_LOCK:
-			case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
-			case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
+		case PROCSIG_RECOVERY_CONFLICT_LOCK:
+		case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
+		case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
 
+			/*
+			 * If we aren't in a transaction any longer then ignore.
+			 */
+			if (!IsTransactionOrTransactionBlock())
+				return;
+
+			/* FALLTHROUGH */
+
+		case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
+
+			/*
+			 * If we're not in a subtransaction then we are OK to throw an
+			 * ERROR to resolve the conflict.  Otherwise drop through to the
+			 * FATAL case.
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is a special case that
+			 * always throws an ERROR (ie never promotes to FATAL), though it
+			 * still has to respect QueryCancelHoldoffCount, so it shares this
+			 * code path.  Logical decoding slots are only acquired while
+			 * performing logical decoding.  During logical decoding no user
+			 * controlled code is run.  During [sub]transaction abort, the
+			 * slot is released.  Therefore user controlled code cannot
+			 * intercept an error before the replication slot is released.
+			 *
+			 * XXX other times that we can throw just an ERROR *may* be
+			 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in parent
+			 * transactions
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held by
+			 * parent transactions and the transaction is not
+			 * transaction-snapshot mode
+			 *
+			 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
+			 * cursors open in parent transactions
+			 */
+			if (reason == PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT ||
+				!IsSubTransaction())
+			{
 				/*
-				 * If we aren't in a transaction any longer then ignore.
+				 * If we already aborted then we no longer need to cancel.  We
+				 * do this here since we do not wish to ignore aborted
+				 * subtransactions, which must cause FATAL, currently.
 				 */
-				if (!IsTransactionOrTransactionBlock())
+				if (IsAbortedTransactionBlockState())
 					return;
 
 				/*
-				 * If we can abort just the current subtransaction then we are
-				 * OK to throw an ERROR to resolve the conflict. Otherwise
-				 * drop through to the FATAL case.
-				 *
-				 * XXX other times that we can throw just an ERROR *may* be
-				 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in
-				 * parent transactions
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held
-				 * by parent transactions and the transaction is not
-				 * transaction-snapshot mode
-				 *
-				 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
-				 * cursors open in parent transactions
+				 * If a recovery conflict happens while we are waiting for
+				 * input from the client, the client is presumably just
+				 * sitting idle in a transaction, preventing recovery from
+				 * making progress.  We'll drop through to the FATAL case
+				 * below to dislodge it, in that case.
 				 */
-				if (!IsSubTransaction())
+				if (!DoingCommandRead)
 				{
-					/*
-					 * If we already aborted then we no longer need to cancel.
-					 * We do this here since we do not wish to ignore aborted
-					 * subtransactions, which must cause FATAL, currently.
-					 */
-					if (IsAbortedTransactionBlockState())
+					/* Avoid losing sync in the FE/BE protocol. */
+					if (QueryCancelHoldoffCount != 0)
+					{
+						/*
+						 * Re-arm and defer this interrupt until later.  See
+						 * similar code in ProcessInterrupts().
+						 */
+						RecoveryConflictPendingReasons[reason] = true;
+						RecoveryConflictPending = true;
+						InterruptPending = true;
 						return;
+					}
 
-					RecoveryConflictPending = true;
-					QueryCancelPending = true;
-					InterruptPending = true;
+					/*
+					 * We are cleared to throw an ERROR.  Either it's the
+					 * logical slot case, or we have a top-level transaction
+					 * that we can abort and a conflict that isn't inherently
+					 * non-retryable.
+					 */
+					LockErrorCleanup();
+					pgstat_report_recovery_conflict(reason);
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("canceling statement due to conflict with recovery"),
+							 errdetail_recovery_conflict(reason)));
 					break;
 				}
+			}
 
-				/* Intentional fall through to session cancel */
-				/* FALLTHROUGH */
-
-			case PROCSIG_RECOVERY_CONFLICT_DATABASE:
-				RecoveryConflictPending = true;
-				ProcDiePending = true;
-				InterruptPending = true;
-				break;
-
-			case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
-				RecoveryConflictPending = true;
-				QueryCancelPending = true;
-				InterruptPending = true;
-				break;
+			/* Intentional fall through to session cancel */
+			/* FALLTHROUGH */
 
-			default:
-				elog(FATAL, "unrecognized conflict mode: %d",
-					 (int) reason);
-		}
+		case PROCSIG_RECOVERY_CONFLICT_DATABASE:
 
-		Assert(RecoveryConflictPending && (QueryCancelPending || ProcDiePending));
+			/*
+			 * Retrying is not possible because the database is dropped, or we
+			 * decided above that we couldn't resolve the conflict with an
+			 * ERROR and fell through.  Terminate the session.
+			 */
+			pgstat_report_recovery_conflict(reason);
+			ereport(FATAL,
+					(errcode(reason == PROCSIG_RECOVERY_CONFLICT_DATABASE ?
+							 ERRCODE_DATABASE_DROPPED :
+							 ERRCODE_T_R_SERIALIZATION_FAILURE),
+					 errmsg("terminating connection due to conflict with recovery"),
+					 errdetail_recovery_conflict(reason),
+					 errhint("In a moment you should be able to reconnect to the"
+							 " database and repeat your command.")));
+			break;
 
-		/*
-		 * All conflicts apart from database cause dynamic errors where the
-		 * command or transaction can be retried at a later point with some
-		 * potential for success. No need to reset this, since non-retryable
-		 * conflict errors are currently FATAL.
-		 */
-		if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE)
-			RecoveryConflictRetryable = false;
+		default:
+			elog(FATAL, "unrecognized conflict mode: %d", (int) reason);
 	}
+}
+
+/*
+ * Check each possible recovery conflict reason.
+ */
+static void
+ProcessRecoveryConflictInterrupts(void)
+{
+	ProcSignalReason reason;
 
 	/*
-	 * Set the process latch. This function essentially emulates signal
-	 * handlers like die() and StatementCancelHandler() and it seems prudent
-	 * to behave similarly as they do.
+	 * We don't need to worry about joggling the elbow of proc_exit, because
+	 * proc_exit_prepare() holds interrupts, so ProcessInterrupts() won't call
+	 * us.
 	 */
-	SetLatch(MyLatch);
+	Assert(!proc_exit_inprogress);
+	Assert(InterruptHoldoffCount == 0);
+	Assert(RecoveryConflictPending);
 
-	errno = save_errno;
+	RecoveryConflictPending = false;
+
+	for (reason = PROCSIG_RECOVERY_CONFLICT_FIRST;
+		 reason <= PROCSIG_RECOVERY_CONFLICT_LAST;
+		 reason++)
+	{
+		if (RecoveryConflictPendingReasons[reason])
+		{
+			RecoveryConflictPendingReasons[reason] = false;
+			ProcessRecoveryConflictInterrupt(reason);
+		}
+	}
 }
 
 /*
@@ -3231,24 +3291,6 @@ ProcessInterrupts(void)
 			 */
 			proc_exit(1);
 		}
-		else if (RecoveryConflictPending && RecoveryConflictRetryable)
-		{
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
-		else if (RecoveryConflictPending)
-		{
-			/* Currently there is only one non-retryable recovery conflict */
-			Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(FATAL,
-					(errcode(ERRCODE_DATABASE_DROPPED),
-					 errmsg("terminating connection due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 		else if (IsBackgroundWorker)
 			ereport(FATAL,
 					(errcode(ERRCODE_ADMIN_SHUTDOWN),
@@ -3291,31 +3333,13 @@ ProcessInterrupts(void)
 				 errmsg("connection to client lost")));
 	}
 
-	/*
-	 * If a recovery conflict happens while we are waiting for input from the
-	 * client, the client is presumably just sitting idle in a transaction,
-	 * preventing recovery from making progress.  Terminate the connection to
-	 * dislodge it.
-	 */
-	if (RecoveryConflictPending && DoingCommandRead)
-	{
-		QueryCancelPending = false; /* this trumps QueryCancel */
-		RecoveryConflictPending = false;
-		LockErrorCleanup();
-		pgstat_report_recovery_conflict(RecoveryConflictReason);
-		ereport(FATAL,
-				(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-				 errmsg("terminating connection due to conflict with recovery"),
-				 errdetail_recovery_conflict(),
-				 errhint("In a moment you should be able to reconnect to the"
-						 " database and repeat your command.")));
-	}
-
 	/*
 	 * Don't allow query cancel interrupts while reading input from the
 	 * client, because we might lose sync in the FE/BE protocol.  (Die
 	 * interrupts are OK, because we won't read any further messages from the
 	 * client in that case.)
+	 *
+	 * See similar logic in ProcessRecoveryConflictInterrupts().
 	 */
 	if (QueryCancelPending && QueryCancelHoldoffCount != 0)
 	{
@@ -3374,16 +3398,6 @@ ProcessInterrupts(void)
 					(errcode(ERRCODE_QUERY_CANCELED),
 					 errmsg("canceling autovacuum task")));
 		}
-		if (RecoveryConflictPending)
-		{
-			RecoveryConflictPending = false;
-			LockErrorCleanup();
-			pgstat_report_recovery_conflict(RecoveryConflictReason);
-			ereport(ERROR,
-					(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-					 errmsg("canceling statement due to conflict with recovery"),
-					 errdetail_recovery_conflict()));
-		}
 
 		/*
 		 * If we are reading a command from the client, just ignore the cancel
@@ -3399,6 +3413,9 @@ ProcessInterrupts(void)
 		}
 	}
 
+	if (RecoveryConflictPending)
+		ProcessRecoveryConflictInterrupts();
+
 	if (IdleInTransactionSessionTimeoutPending)
 	{
 		/*
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 2f52100b00..3a3a7eca77 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -38,13 +38,15 @@ typedef enum
 	PROCSIG_PARALLEL_APPLY_MESSAGE, /* Message from parallel apply workers */
 
 	/* Recovery conflict reasons */
-	PROCSIG_RECOVERY_CONFLICT_DATABASE,
+	PROCSIG_RECOVERY_CONFLICT_FIRST,
+	PROCSIG_RECOVERY_CONFLICT_DATABASE = PROCSIG_RECOVERY_CONFLICT_FIRST,
 	PROCSIG_RECOVERY_CONFLICT_TABLESPACE,
 	PROCSIG_RECOVERY_CONFLICT_LOCK,
 	PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
 	PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+	PROCSIG_RECOVERY_CONFLICT_LAST = PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index abd7b4fff3..ab43b638ee 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -70,8 +70,7 @@ extern void die(SIGNAL_ARGS);
 extern void quickdie(SIGNAL_ARGS) pg_attribute_noreturn();
 extern void StatementCancelHandler(SIGNAL_ARGS);
 extern void FloatExceptionHandler(SIGNAL_ARGS) pg_attribute_noreturn();
-extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
-																 * handler */
+extern void HandleRecoveryConflictInterrupt(ProcSignalReason reason);
 extern void ProcessClientReadInterrupt(bool blocked);
 extern void ProcessClientWriteInterrupt(bool blocked);
 
-- 
2.41.0