stopgap fix for signal handling during restore_command

Started by Nathan Bossartabout 3 years ago29 messageshackers
Jump to latest
#1Nathan Bossart
nathandbossart@gmail.com

On Wed, Feb 22, 2023 at 09:48:10PM +1300, Thomas Munro wrote:

On Tue, Feb 21, 2023 at 5:50 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

I'm happy to create a new thread if needed, but I can't tell if there is
any interest in this stopgap/back-branch fix. Perhaps we should just jump
straight to the long-term fix that Thomas is looking into.

Unfortunately the latch-friendly subprocess module proposal I was
talking about would be for 17. I may post a thread fairly soon with
design ideas + list of problems and decision points as I see them, and
hopefully some sketch code, but it won't be a proposal for [/me checks
calendar] next week's commitfest and probably wouldn't be appropriate
in a final commitfest anyway, and I also have some other existing
stuff to clear first. So please do continue with the stopgap ideas.

Okay, here is a new thread...

Since v8.4, the startup process will proc_exit() immediately within its
SIGTERM handler while the restore_command executes via system(). Some
recent changes added unsafe code to the section where this behavior is
enabled [0]/messages/by-id/20230201105514.rsjl4bnhb65giyvo@alap3.anarazel.de. The long-term fix likely includes moving away from system()
completely, but we may want to have a stopgap/back-branch fix while that is
under development.

I've attached a patch set for a proposed stopgap fix. 0001 simply moves
the extra code outside of the Pre/PostRestoreCommand() block so that only
system() is executed while the SIGTERM handler might proc_exit(). This
restores the behavior that was in place from v8.4 to v14, so I don't expect
it to be too controversial. 0002 adds code to startup's SIGTERM handler to
call _exit() instead of proc_exit() if we are in a forked process from
system(), etc. It also adds assertions to ensure proc_exit(), ProcKill(),
and AuxiliaryProcKill() are not called within such forked processes.

Thoughts?

[0]: /messages/by-id/20230201105514.rsjl4bnhb65giyvo@alap3.anarazel.de

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v6-0001-Move-extra-code-out-of-the-Pre-PostRestoreCommand.patchtext/x-diff; charset=us-asciiDownload+11-5
v6-0002-Don-t-proc_exit-in-startup-s-SIGTERM-handler-if-f.patchtext/x-diff; charset=us-asciiDownload+24-2
#2Thomas Munro
thomas.munro@gmail.com
In reply to: Nathan Bossart (#1)
Re: stopgap fix for signal handling during restore_command

On Fri, Feb 24, 2023 at 12:15 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:

Thoughts?

I think you should have a trailing \n when writing to stderr.

Here's that reproducer I speculated about (sorry I confused SIGQUIT
and SIGTERM in my earlier email, ENOCOFFEE). Seems to do the job, and
I tested on a Linux box for good measure. If you comment out the
kill(), "check PROVE_TESTS=t/002_archiving.pl" works fine
(demonstrating that that definition of system() works fine). With the
kill(), it reliably reaches 'TRAP: failed Assert("latch->owner_pid ==
MyProcPid")' without your patch, and with your patch it avoids it. (I
believe glibc's system() could reach it too with the right timing, but
I didn't try, my point being that the use of the OpenBSD system() here
is only because it's easier to grok and to wrangle.)

Attachments:

0001-XXX-inject-SIGTERM-into-system-at-an-inconvenient-mo.xatchapplication/octet-stream; name=0001-XXX-inject-SIGTERM-into-system-at-an-inconvenient-mo.xatchDownload+67-2
#3Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#2)
Re: stopgap fix for signal handling during restore_command

On Fri, Feb 24, 2023 at 1:25 PM Thomas Munro <thomas.munro@gmail.com> wrote:

ENOCOFFEE

Erm, I realised after sending that I'd accidentally sent a version
that uses fork() anyway, and now if I change it back to vfork() it
doesn't fail the way I wanted to demonstrate, at least on Linux. I
don't have time or desire to dig into how Linux vfork() really works
so I'll leave it at that... but the patch as posted does seem to be a
useful tool for understanding this failure... please just ignore the
confused comments about fork() vs vfork() therein.

#4Nathan Bossart
nathandbossart@gmail.com
In reply to: Thomas Munro (#2)
Re: stopgap fix for signal handling during restore_command

On Fri, Feb 24, 2023 at 01:25:01PM +1300, Thomas Munro wrote:

I think you should have a trailing \n when writing to stderr.

Oops. I added that in v7.

Here's that reproducer I speculated about (sorry I confused SIGQUIT
and SIGTERM in my earlier email, ENOCOFFEE). Seems to do the job, and
I tested on a Linux box for good measure. If you comment out the
kill(), "check PROVE_TESTS=t/002_archiving.pl" works fine
(demonstrating that that definition of system() works fine). With the
kill(), it reliably reaches 'TRAP: failed Assert("latch->owner_pid ==
MyProcPid")' without your patch, and with your patch it avoids it. (I
believe glibc's system() could reach it too with the right timing, but
I didn't try, my point being that the use of the OpenBSD system() here
is only because it's easier to grok and to wrangle.)

Thanks for providing the reproducer! I am seeing the behavior that you
described on my Linux machine.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v7-0001-Move-extra-code-out-of-the-Pre-PostRestoreCommand.patchtext/x-diff; charset=us-asciiDownload+11-5
v7-0002-Don-t-proc_exit-in-startup-s-SIGTERM-handler-if-f.patchtext/x-diff; charset=us-asciiDownload+24-2
#5Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#4)
Re: stopgap fix for signal handling during restore_command

Hi,

On 2023-02-23 20:33:23 -0800, Nathan Bossart wrote:>

if (in_restore_command)
-		proc_exit(1);
+	{
+		/*
+		 * If we are in a child process (e.g., forked by system() in
+		 * RestoreArchivedFile()), we don't want to call any exit callbacks.
+		 * The parent will take care of that.
+		 */
+		if (MyProcPid == (int) getpid())
+			proc_exit(1);
+		else
+		{
+			const char	msg[] = "StartupProcShutdownHandler() called in child process\n";
+			int			rc pg_attribute_unused();
+
+			rc = write(STDERR_FILENO, msg, sizeof(msg));
+			_exit(1);
+		}
+	}

Why do we need that rc variable? Don't we normally get away with (void)
write(...)?

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..e3da0622d7 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
dlist_head *procgloballist;

Assert(MyProc != NULL);
+ Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */

/* Make sure we're out of the sync rep lists */
SyncRepCleanupAtProcExit();
@@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg)
PGPROC *proc;

Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
+ Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */

auxproc = &AuxiliaryProcs[proctype];

--
2.25.1

I think the much more interesting assertion here would be to check that
MyProc->pid equals the current pid.

Greetings,

Andres Freund

#6Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#5)
Re: stopgap fix for signal handling during restore_command

On Sat, Feb 25, 2023 at 11:07:42AM -0800, Andres Freund wrote:

On 2023-02-23 20:33:23 -0800, Nathan Bossart wrote:>

if (in_restore_command)
-		proc_exit(1);
+	{
+		/*
+		 * If we are in a child process (e.g., forked by system() in
+		 * RestoreArchivedFile()), we don't want to call any exit callbacks.
+		 * The parent will take care of that.
+		 */
+		if (MyProcPid == (int) getpid())
+			proc_exit(1);
+		else
+		{
+			const char	msg[] = "StartupProcShutdownHandler() called in child process\n";
+			int			rc pg_attribute_unused();
+
+			rc = write(STDERR_FILENO, msg, sizeof(msg));
+			_exit(1);
+		}
+	}

Why do we need that rc variable? Don't we normally get away with (void)
write(...)?

My compiler complains about that. :/

../postgresql/src/backend/postmaster/startup.c: In function ‘StartupProcShutdownHandler’:
../postgresql/src/backend/postmaster/startup.c:139:11: error: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Werror=unused-result]
139 | (void) write(STDERR_FILENO, msg, sizeof(msg));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..e3da0622d7 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
dlist_head *procgloballist;

Assert(MyProc != NULL);
+ Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */

/* Make sure we're out of the sync rep lists */
SyncRepCleanupAtProcExit();
@@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg)
PGPROC *proc;

Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
+ Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */

auxproc = &AuxiliaryProcs[proctype];

--
2.25.1

I think the much more interesting assertion here would be to check that
MyProc->pid equals the current pid.

I don't mind changing this, but why is this a more interesting assertion?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#7Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#6)
Re: stopgap fix for signal handling during restore_command

On Sat, Feb 25, 2023 at 11:28:25AM -0800, Nathan Bossart wrote:

On Sat, Feb 25, 2023 at 11:07:42AM -0800, Andres Freund wrote:

I think the much more interesting assertion here would be to check that
MyProc->pid equals the current pid.

I don't mind changing this, but why is this a more interesting assertion?

Here is a new patch set with this change.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v8-0001-Move-extra-code-out-of-the-Pre-PostRestoreCommand.patchtext/x-diff; charset=us-asciiDownload+11-5
v8-0002-Don-t-proc_exit-in-startup-s-SIGTERM-handler-if-f.patchtext/x-diff; charset=us-asciiDownload+24-2
#8Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#6)
Re: stopgap fix for signal handling during restore_command

Hi,

On 2023-02-25 11:28:25 -0800, Nathan Bossart wrote:

On Sat, Feb 25, 2023 at 11:07:42AM -0800, Andres Freund wrote:

Why do we need that rc variable? Don't we normally get away with (void)
write(...)?

My compiler complains about that. :/

../postgresql/src/backend/postmaster/startup.c: In function ‘StartupProcShutdownHandler’:
../postgresql/src/backend/postmaster/startup.c:139:11: error: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Werror=unused-result]
139 | (void) write(STDERR_FILENO, msg, sizeof(msg));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Ick. I guess we've already encountered this, because we've apparently removed
all the (void) write cases. Which I am certain we had at some point. We still
do it for a bunch of other functions though. Ah, yes: aa90e148ca7,
27314d32a88, 6c72a28e5ce etc.

I think I opined on this before, but we really ought to have a function to do
some minimal signal safe output. Implemented centrally, instead of being open
coded in a bunch of places.

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..e3da0622d7 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
dlist_head *procgloballist;

Assert(MyProc != NULL);
+ Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */

/* Make sure we're out of the sync rep lists */
SyncRepCleanupAtProcExit();
@@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg)
PGPROC *proc;

Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
+ Assert(MyProcPid == (int) getpid()); /* not safe if forked by system(), etc. */

auxproc = &AuxiliaryProcs[proctype];

--
2.25.1

I think the much more interesting assertion here would be to check that
MyProc->pid equals the current pid.

I don't mind changing this, but why is this a more interesting assertion?

Because we so far have little to no protection against some state corruption
leading to releasing PGPROC that's not ours. I didn't actually mean that we
shouldn't check that MyProcPid == (int) getpid(), just that the much more
interesting case to check is that MyProc->pid matches, because that protect
against multiple releases, releasing the wrong slot, etc.

Greetings,

Andres Freund

#9Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#8)
Re: stopgap fix for signal handling during restore_command

On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote:

I think I opined on this before, but we really ought to have a function to do
some minimal signal safe output. Implemented centrally, instead of being open
coded in a bunch of places.

While looking around for the right place to put this, I noticed that
there's a write_stderr() function in elog.c that we might be able to use.
I used that in v9. WDYT?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v9-0001-Move-extra-code-out-of-the-Pre-PostRestoreCommand.patchtext/x-diff; charset=us-asciiDownload+11-5
v9-0002-Don-t-proc_exit-in-startup-s-SIGTERM-handler-if-f.patchtext/x-diff; charset=us-asciiDownload+21-2
#10Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#9)
Re: stopgap fix for signal handling during restore_command

Hi,

On 2023-02-25 14:06:29 -0800, Nathan Bossart wrote:

On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote:

I think I opined on this before, but we really ought to have a function to do
some minimal signal safe output. Implemented centrally, instead of being open
coded in a bunch of places.

While looking around for the right place to put this, I noticed that
there's a write_stderr() function in elog.c that we might be able to use.
I used that in v9. WDYT?

write_stderr() isn't signal safe, from what I can tell.

#11Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#10)
Re: stopgap fix for signal handling during restore_command

On Sun, Feb 26, 2023 at 10:00:29AM -0800, Andres Freund wrote:

On 2023-02-25 14:06:29 -0800, Nathan Bossart wrote:

On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote:

I think I opined on this before, but we really ought to have a function to do
some minimal signal safe output. Implemented centrally, instead of being open
coded in a bunch of places.

While looking around for the right place to put this, I noticed that
there's a write_stderr() function in elog.c that we might be able to use.
I used that in v9. WDYT?

write_stderr() isn't signal safe, from what I can tell.

*facepalm* Sorry.

What precisely did you have in mind? AFAICT you are asking for a wrapper
around write().

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#12Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#11)
Re: stopgap fix for signal handling during restore_command

Hi,

On 2023-02-26 11:39:00 -0800, Nathan Bossart wrote:

On Sun, Feb 26, 2023 at 10:00:29AM -0800, Andres Freund wrote:

On 2023-02-25 14:06:29 -0800, Nathan Bossart wrote:

On Sat, Feb 25, 2023 at 11:52:53AM -0800, Andres Freund wrote:

I think I opined on this before, but we really ought to have a function to do
some minimal signal safe output. Implemented centrally, instead of being open
coded in a bunch of places.

While looking around for the right place to put this, I noticed that
there's a write_stderr() function in elog.c that we might be able to use.
I used that in v9. WDYT?

write_stderr() isn't signal safe, from what I can tell.

*facepalm* Sorry.

What precisely did you have in mind? AFAICT you are asking for a wrapper
around write().

Partially I just want something that can easily be searched for, that can have
comments attached to it documenting why what it is doing is safe.

It'd not be a huge amount of work to have a slow and restricted string
interpolation support, to make it easier to write messages. Converting floats
is probably too hard to do safely, and I'm not sure %m can safely be
supported. But basic things like %d would be pretty simple.

Basically a loop around the format string that directly writes to stderr using
write(), and only supports a signal safe subset of normal format strings.

Greetings,

Andres Freund

#13Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#12)
Re: stopgap fix for signal handling during restore_command

On Sun, Feb 26, 2023 at 12:12:27PM -0800, Andres Freund wrote:

On 2023-02-26 11:39:00 -0800, Nathan Bossart wrote:

What precisely did you have in mind? AFAICT you are asking for a wrapper
around write().

Partially I just want something that can easily be searched for, that can have
comments attached to it documenting why what it is doing is safe.

It'd not be a huge amount of work to have a slow and restricted string
interpolation support, to make it easier to write messages. Converting floats
is probably too hard to do safely, and I'm not sure %m can safely be
supported. But basic things like %d would be pretty simple.

Basically a loop around the format string that directly writes to stderr using
write(), and only supports a signal safe subset of normal format strings.

Got it, thanks. I will try to put something together along these lines,
although I don't know if I'll pick up the interpolation support in this
thread.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#14Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#13)
Re: stopgap fix for signal handling during restore_command

On Tue, Feb 28, 2023 at 08:36:03PM -0800, Nathan Bossart wrote:

On Sun, Feb 26, 2023 at 12:12:27PM -0800, Andres Freund wrote:

Partially I just want something that can easily be searched for, that can have
comments attached to it documenting why what it is doing is safe.

It'd not be a huge amount of work to have a slow and restricted string
interpolation support, to make it easier to write messages. Converting floats
is probably too hard to do safely, and I'm not sure %m can safely be
supported. But basic things like %d would be pretty simple.

Basically a loop around the format string that directly writes to stderr using
write(), and only supports a signal safe subset of normal format strings.

Got it, thanks. I will try to put something together along these lines,
although I don't know if I'll pick up the interpolation support in this
thread.

Here is an attempt at adding a signal safe function for writing to STDERR.

I didn't add support for format strings, but looking ahead, I think one
challenge will be avoiding va_start() and friends. In any case, IMO format
string support probably deserves its own thread.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v10-0001-Move-extra-code-out-of-the-Pre-PostRestoreComman.patchtext/x-diff; charset=us-asciiDownload+11-5
v10-0002-Don-t-proc_exit-in-startup-s-SIGTERM-handler-if-.patchtext/x-diff; charset=us-asciiDownload+50-7
#15Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#14)
Re: stopgap fix for signal handling during restore_command

Hi,

On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote:

On Tue, Feb 28, 2023 at 08:36:03PM -0800, Nathan Bossart wrote:

On Sun, Feb 26, 2023 at 12:12:27PM -0800, Andres Freund wrote:

Partially I just want something that can easily be searched for, that can have
comments attached to it documenting why what it is doing is safe.

It'd not be a huge amount of work to have a slow and restricted string
interpolation support, to make it easier to write messages. Converting floats
is probably too hard to do safely, and I'm not sure %m can safely be
supported. But basic things like %d would be pretty simple.

Basically a loop around the format string that directly writes to stderr using
write(), and only supports a signal safe subset of normal format strings.

Got it, thanks. I will try to put something together along these lines,
although I don't know if I'll pick up the interpolation support in this
thread.

Here is an attempt at adding a signal safe function for writing to STDERR.

Cool.

I didn't add support for format strings, but looking ahead, I think one
challenge will be avoiding va_start() and friends. In any case, IMO format
string support probably deserves its own thread.

Makes sense to split that off.

FWIW, I think we could rely on va_start() et al to be signal safe. The
standardese isn't super clear about this, because they aren't functions, and
posix only talks about functions being async signal safe...

Greetings,

Andres Freund

#16Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#15)
Re: stopgap fix for signal handling during restore_command

On Wed, Mar 01, 2023 at 03:13:04PM -0800, Andres Freund wrote:

FWIW, I think we could rely on va_start() et al to be signal safe. The
standardese isn't super clear about this, because they aren't functions, and
posix only talks about functions being async signal safe...

Good to know. I couldn't tell whether that was a safe assumption from
briefly reading around.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#17Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#15)
Re: stopgap fix for signal handling during restore_command

On Wed, Mar 01, 2023 at 03:13:04PM -0800, Andres Freund wrote:

On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote:

Here is an attempt at adding a signal safe function for writing to STDERR.

Cool.

I'm gently bumping this thread to see if anyone had additional feedback on
the proposed patches [0]/messages/by-id/20230301224751.GA1823946@nathanxps13. The intent was to back-patch these as needed and
to pursue a long-term fix in v17. Are there any concerns with that?

[0]: /messages/by-id/20230301224751.GA1823946@nathanxps13

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#18Peter Eisentraut
peter_e@gmx.net
In reply to: Nathan Bossart (#17)
Re: stopgap fix for signal handling during restore_command

On 22.04.23 01:07, Nathan Bossart wrote:

On Wed, Mar 01, 2023 at 03:13:04PM -0800, Andres Freund wrote:

On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote:

Here is an attempt at adding a signal safe function for writing to STDERR.

Cool.

I'm gently bumping this thread to see if anyone had additional feedback on
the proposed patches [0]. The intent was to back-patch these as needed and
to pursue a long-term fix in v17. Are there any concerns with that?

[0] /messages/by-id/20230301224751.GA1823946@nathanxps13

Is this still being contemplated? What is the status of this?

#19Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#18)
Re: stopgap fix for signal handling during restore_command

On Sun, Oct 01, 2023 at 08:50:15PM +0200, Peter Eisentraut wrote:

Is this still being contemplated? What is the status of this?

I'll plan on committing this in the next couple of days.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#20Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#14)
Re: stopgap fix for signal handling during restore_command

On 2023-03-01 14:47:51 -0800, Nathan Bossart wrote:

Subject: [PATCH v10 1/2] Move extra code out of the Pre/PostRestoreCommand()
block.

LGTM

From fb6957da01f11b75d1a1966f32b00e2e77c789a0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandbossart@gmail.com>
Date: Tue, 14 Feb 2023 09:44:53 -0800
Subject: [PATCH v10 2/2] Don't proc_exit() in startup's SIGTERM handler if
forked by system().

Instead, emit a message to STDERR and _exit() in this case. This
change also adds assertions to proc_exit(), ProcKill(), and
AuxiliaryProcKill() to verify that these functions are not called
by a process forked by system(), etc.
---
src/backend/postmaster/startup.c | 17 ++++++++++++++++-
src/backend/storage/ipc/ipc.c | 3 +++
src/backend/storage/lmgr/proc.c | 2 ++
src/backend/utils/error/elog.c | 28 ++++++++++++++++++++++++++++
src/include/utils/elog.h | 6 +-----
5 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..b9e2c3aafe 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
dlist_head *procgloballist;

Assert(MyProc != NULL);
+ Assert(MyProc->pid == (int) getpid()); /* not safe if forked by system(), etc. */

/* Make sure we're out of the sync rep lists */
SyncRepCleanupAtProcExit();
@@ -925,6 +926,7 @@ AuxiliaryProcKill(int code, Datum arg)
PGPROC *proc;

Assert(proctype >= 0 && proctype < NUM_AUXILIARY_PROCS);
+ Assert(MyProc->pid == (int) getpid()); /* not safe if forked by system(), etc. */

auxproc = &AuxiliaryProcs[proctype];

I'd make these elog(PANIC), I think. The paths are not performance critical
enough that a single branch hurts, so the overhead of the check is irrelevant,
and the consequences of calling ProcKill() twice for the same process are very
severe.

+/*
+ * Write a message to STDERR using only async-signal-safe functions.  This can
+ * be used to safely emit a message from a signal handler.
+ *
+ * TODO: It is likely possible to safely do a limited amount of string
+ * interpolation (e.g., %s and %d), but that is not presently supported.
+ */
+void
+write_stderr_signal_safe(const char *fmt)

As is, this isn't a format, so I'd probably just name it s or str :)

-/*
- * Write errors to stderr (or by equal means when stderr is
- * not available). Used before ereport/elog can be used
- * safely (memory context, GUC load etc)
- */
extern void write_stderr(const char *fmt,...) pg_attribute_printf(1, 2);
+extern void write_stderr_signal_safe(const char *fmt);

Not sure why you removed the comment?

Greetings,

Andres Freund

#21Nathan Bossart
nathandbossart@gmail.com
In reply to: Andres Freund (#20)
#22Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#21)
#23Andres Freund
andres@anarazel.de
In reply to: Nathan Bossart (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#23)
#25Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#24)
#26Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#25)
#27Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#27)
#29Nathan Bossart
nathandbossart@gmail.com
In reply to: Tom Lane (#28)