Fix early elog(FATAL)

Started by Noah Mischabout 1 year ago7 messages
#1Noah Misch
noah@leadboat.com
1 attachment(s)

main() says:

/*
* Fire up essential subsystems: error and memory management
*
* Code after this point is allowed to use elog/ereport, though
* localization of messages may not work right away, and messages won't go
* anywhere but stderr until GUC settings get loaded.
*/
MemoryContextInit();

However, appending elog(ERROR, "whoops") breaks like:

$ initdb -D discard_me
FATAL: whoops
PANIC: proc_exit() called in child process
no data was returned by command ""/home/nm/sw/nopath/pghead/bin/postgres" -V"
child process was terminated by signal 6: Aborted

So does the ereport(FATAL) in ClosePostmasterPorts(). The "called in child
process" check (added in commit 97550c0 of 2023-10) reads MyProcPid, which we
set later. Three ways to fix this:

1. Call InitProcessGlobals() earlier. This could also reduce the total call
sites from 3 to 2 (main() and post-fork).

2. Move MyProcPid init out of InitProcessGlobals(), to main() and post-fork.
This has less to go wrong in back branches. While probably irrelevant,
this avoids calling pg_prng_strong_seed() in processes that will exit after
help() or GucInfoMain().

3. Revert 97550c0, as commit 3b00fdb anticipated.

I don't think the choice matters much, so here is (2).

Attachments:

early-elog-v1.patchtext/plain; charset=us-asciiDownload
Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Fix elog(FATAL) before PostmasterMain() or just after fork().
    
    Since commit 97550c0711972a9856b5db751539bbaf2f88884c, these failed with
    "PANIC:  proc_exit() called in child process" due to uninitialized or
    stale MyProcPid.  That was reachable if close() failed in
    ClosePostmasterPorts() or setlocale(category, "C") failed, both
    unlikely.  Back-patch to v13 (all supported versions).
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 3acb46b..e286810 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -109,6 +109,7 @@ main(int argc, char *argv[])
 	 * localization of messages may not work right away, and messages won't go
 	 * anywhere but stderr until GUC settings get loaded.
 	 */
+	MyProcPid = getpid();
 	MemoryContextInit();
 
 	/*
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index 5e42a74..b3010e3 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -19,6 +19,7 @@
 #include <unistd.h>
 
 #include "libpq/pqsignal.h"
+#include "miscadmin.h"
 #include "postmaster/fork_process.h"
 
 #ifndef WIN32
@@ -66,6 +67,7 @@ fork_process(void)
 	if (result == 0)
 	{
 		/* fork succeeded, in child */
+		MyProcPid = getpid();
 #ifdef LINUX_PROFILE
 		setitimer(ITIMER_PROF, &prof_itimer, NULL);
 #endif
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ce00f40..f0f9c66 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1899,14 +1899,13 @@ ClosePostmasterPorts(bool am_syslogger)
 
 
 /*
- * InitProcessGlobals -- set MyProcPid, MyStartTime[stamp], random seeds
+ * InitProcessGlobals -- set MyStartTime[stamp], random seeds
  *
  * Called early in the postmaster and every backend.
  */
 void
 InitProcessGlobals(void)
 {
-	MyProcPid = getpid();
 	MyStartTimestamp = GetCurrentTimestamp();
 	MyStartTime = timestamptz_to_time_t(MyStartTimestamp);
 
diff --git a/src/port/pqsignal.c b/src/port/pqsignal.c
index 6ca2d4e..bdaa9f1 100644
--- a/src/port/pqsignal.c
+++ b/src/port/pqsignal.c
@@ -74,8 +74,7 @@ static volatile pqsigfunc pqsignal_handlers[PG_NSIG];
 /*
  * Except when called with SIG_IGN or SIG_DFL, pqsignal() sets up this function
  * as the handler for all signals.  This wrapper handler function checks that
- * it is called within a process that the server knows about (i.e., any process
- * that has called InitProcessGlobals(), such as a client backend), and not a
+ * it is called within a process that knew to maintain MyProcPid, and not a
  * child process forked by system(3), etc.  This check ensures that such child
  * processes do not modify shared memory, which is often detrimental.  If the
  * check succeeds, the function originally provided to pqsignal() is called.
#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Noah Misch (#1)
Re: Fix early elog(FATAL)

On Sat, Dec 07, 2024 at 07:46:14PM -0800, Noah Misch wrote:

main() says:

/*
* Fire up essential subsystems: error and memory management
*
* Code after this point is allowed to use elog/ereport, though
* localization of messages may not work right away, and messages won't go
* anywhere but stderr until GUC settings get loaded.
*/
MemoryContextInit();

However, appending elog(ERROR, "whoops") breaks like:

$ initdb -D discard_me
FATAL: whoops
PANIC: proc_exit() called in child process
no data was returned by command ""/home/nm/sw/nopath/pghead/bin/postgres" -V"
child process was terminated by signal 6: Aborted

So does the ereport(FATAL) in ClosePostmasterPorts(). The "called in child
process" check (added in commit 97550c0 of 2023-10) reads MyProcPid, which we
set later. Three ways to fix this:

I noticed that you committed a fix for this. Sorry for not responding
earlier.

1. Call InitProcessGlobals() earlier. This could also reduce the total call
sites from 3 to 2 (main() and post-fork).

2. Move MyProcPid init out of InitProcessGlobals(), to main() and post-fork.
This has less to go wrong in back branches. While probably irrelevant,
this avoids calling pg_prng_strong_seed() in processes that will exit after
help() or GucInfoMain().

3. Revert 97550c0, as commit 3b00fdb anticipated.

I did partially revert 97550c0 in commit 8fd0498, but we decided to leave
some of the checks [0]/messages/by-id/20231122225945.3kgclsgz5lqmtnan@awork3.anarazel.de.

I don't think the choice matters much, so here is (2).

FWIW I'd probably vote for option 1. That keeps the initialization of the
globals together, reduces the call sites, and fixes the bug. I'd worry a
little about moving the MyProcPid assignments out of that function without
adding a bunch of commentary to explain why.

[0]: /messages/by-id/20231122225945.3kgclsgz5lqmtnan@awork3.anarazel.de

--
nathan

#3Noah Misch
noah@leadboat.com
In reply to: Nathan Bossart (#2)
Re: Fix early elog(FATAL)

On Tue, Dec 10, 2024 at 04:18:19PM -0600, Nathan Bossart wrote:

On Sat, Dec 07, 2024 at 07:46:14PM -0800, Noah Misch wrote:

Three ways to fix this:

I noticed that you committed a fix for this. Sorry for not responding
earlier.

1. Call InitProcessGlobals() earlier. This could also reduce the total call
sites from 3 to 2 (main() and post-fork).

2. Move MyProcPid init out of InitProcessGlobals(), to main() and post-fork.
This has less to go wrong in back branches. While probably irrelevant,
this avoids calling pg_prng_strong_seed() in processes that will exit after
help() or GucInfoMain().

3. Revert 97550c0, as commit 3b00fdb anticipated.

I did partially revert 97550c0 in commit 8fd0498, but we decided to leave
some of the checks

Got it. Old branches couldn't merely do (3), anyway, since 3b00fdb is v17+.
I missed that while writing the list.

I don't think the choice matters much, so here is (2).

FWIW I'd probably vote for option 1. That keeps the initialization of the
globals together, reduces the call sites, and fixes the bug. I'd worry a
little about moving the MyProcPid assignments out of that function without
adding a bunch of commentary to explain why.

Can you say more about that? A comment about MyProcPid could say "fork() is
the one thing that changes the getpid() return value". To me, the things
InitProcessGlobals() sets are all different. MyProcPid can be set without
elog(ERROR) and gets invalidated at fork(). The others reasonably could
elog(ERROR). (They currently don't.) The random state could have a different
lifecycle. If we had a builtin pooler that reused processes, we'd
reinitialize random state at each process reuse, not at each fork(). So I see
the grouping of (MyProcPid, MyStartTimestamp, random seed) as mostly an
accident of history.

Thanks,
nm

#4Nathan Bossart
nathandbossart@gmail.com
In reply to: Noah Misch (#3)
Re: Fix early elog(FATAL)

On Wed, Dec 11, 2024 at 07:34:14PM -0800, Noah Misch wrote:

On Tue, Dec 10, 2024 at 04:18:19PM -0600, Nathan Bossart wrote:

FWIW I'd probably vote for option 1. That keeps the initialization of the
globals together, reduces the call sites, and fixes the bug. I'd worry a
little about moving the MyProcPid assignments out of that function without
adding a bunch of commentary to explain why.

Can you say more about that? A comment about MyProcPid could say "fork() is
the one thing that changes the getpid() return value". To me, the things
InitProcessGlobals() sets are all different. MyProcPid can be set without
elog(ERROR) and gets invalidated at fork(). The others reasonably could
elog(ERROR). (They currently don't.) The random state could have a different
lifecycle. If we had a builtin pooler that reused processes, we'd
reinitialize random state at each process reuse, not at each fork(). So I see
the grouping of (MyProcPid, MyStartTimestamp, random seed) as mostly an
accident of history.

Fair enough. I suppose part of my hesitation stems from expecting hackers
to be more likely to remember to call InitProcessGlobals() than to
initialize MyProcPid. But given your change requires initializing
MyProcPid in exactly 2 places, and there are unlikely to be more in the
near future, I might be overthinking it.

--
nathan

#5Noah Misch
noah@leadboat.com
In reply to: Nathan Bossart (#4)
Re: Fix early elog(FATAL)

On Thu, Dec 12, 2024 at 10:07:00AM -0600, Nathan Bossart wrote:

On Wed, Dec 11, 2024 at 07:34:14PM -0800, Noah Misch wrote:

On Tue, Dec 10, 2024 at 04:18:19PM -0600, Nathan Bossart wrote:

FWIW I'd probably vote for option 1. That keeps the initialization of the
globals together, reduces the call sites, and fixes the bug. I'd worry a
little about moving the MyProcPid assignments out of that function without
adding a bunch of commentary to explain why.

Can you say more about that? A comment about MyProcPid could say "fork() is
the one thing that changes the getpid() return value". To me, the things
InitProcessGlobals() sets are all different. MyProcPid can be set without
elog(ERROR) and gets invalidated at fork(). The others reasonably could
elog(ERROR). (They currently don't.) The random state could have a different
lifecycle. If we had a builtin pooler that reused processes, we'd
reinitialize random state at each process reuse, not at each fork(). So I see
the grouping of (MyProcPid, MyStartTimestamp, random seed) as mostly an
accident of history.

Fair enough. I suppose part of my hesitation stems from expecting hackers
to be more likely to remember to call InitProcessGlobals() than to
initialize MyProcPid. But given your change requires initializing
MyProcPid in exactly 2 places, and there are unlikely to be more in the
near future, I might be overthinking it.

I don't feel strongly either way. I did write it the option-1 way originally.
Then I started thinking about changes at a distance causing the other
InitProcessGlobals() tasks to palloc or elog. We could do option-1 in master
and keep the back branches in their current state.

#6Nathan Bossart
nathandbossart@gmail.com
In reply to: Noah Misch (#5)
Re: Fix early elog(FATAL)

On Fri, Dec 13, 2024 at 07:15:05PM -0800, Noah Misch wrote:

On Thu, Dec 12, 2024 at 10:07:00AM -0600, Nathan Bossart wrote:

On Wed, Dec 11, 2024 at 07:34:14PM -0800, Noah Misch wrote:

On Tue, Dec 10, 2024 at 04:18:19PM -0600, Nathan Bossart wrote:

FWIW I'd probably vote for option 1. That keeps the initialization of the
globals together, reduces the call sites, and fixes the bug. I'd worry a
little about moving the MyProcPid assignments out of that function without
adding a bunch of commentary to explain why.

Can you say more about that? A comment about MyProcPid could say "fork() is
the one thing that changes the getpid() return value". To me, the things
InitProcessGlobals() sets are all different. MyProcPid can be set without
elog(ERROR) and gets invalidated at fork(). The others reasonably could
elog(ERROR). (They currently don't.) The random state could have a different
lifecycle. If we had a builtin pooler that reused processes, we'd
reinitialize random state at each process reuse, not at each fork(). So I see
the grouping of (MyProcPid, MyStartTimestamp, random seed) as mostly an
accident of history.

I just noticed that InitProcessGlobals() is relatively new. It was added
in v12 by commit 197e4af.

Fair enough. I suppose part of my hesitation stems from expecting hackers
to be more likely to remember to call InitProcessGlobals() than to
initialize MyProcPid. But given your change requires initializing
MyProcPid in exactly 2 places, and there are unlikely to be more in the
near future, I might be overthinking it.

I don't feel strongly either way. I did write it the option-1 way originally.
Then I started thinking about changes at a distance causing the other
InitProcessGlobals() tasks to palloc or elog. We could do option-1 in master
and keep the back branches in their current state.

I don't feel strongly either way, either. I don't think it's important
enough to diverge from the back-branches.

--
nathan

#7Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#6)
Re: Fix early elog(FATAL)

On Fri, Dec 13, 2024 at 10:14:43PM -0600, Nathan Bossart wrote:

On Fri, Dec 13, 2024 at 07:15:05PM -0800, Noah Misch wrote:

I don't feel strongly either way. I did write it the option-1 way originally.
Then I started thinking about changes at a distance causing the other
InitProcessGlobals() tasks to palloc or elog. We could do option-1 in master
and keep the back branches in their current state.

I don't feel strongly either way, either. I don't think it's important
enough to diverge from the back-branches.

Also, thank you for fixing this.

--
nathan