Refactoring backend fork+exec code

Started by Heikki Linnakangasalmost 3 years ago63 messageshackers
Jump to latest
#1Heikki Linnakangas
heikki.linnakangas@enterprisedb.com

I started to look at the code in postmaster.c related to launching child
processes. I tried to reduce the difference between EXEC_BACKEND and
!EXEC_BACKEND code paths, and put the code that needs to differ behind a
better abstraction. I started doing this to help with implementing
multi-threading, but it doesn't introduce anything thread-related yet
and I think this improves readability anyway.

This is still work-inprogress, especially the last, big, patch in the
patch set. Mainly, I need to clean up the comments in the new
launch_backend.c file. But the other patches are in pretty good shape,
and if you ignore launch_backend.c, you can see the effect on the other
source files.

With these patches, there is a new function for launching a postmaster
child process:

pid_t postmaster_child_launch(PostmasterChildType child_type, char
*startup_data, size_t startup_data_len, ClientSocket *client_sock);

This function hides the differences between EXEC_BACKEND and
!EXEC_BACKEND cases.

In 'startup_data', the caller can pass a blob of data to the child
process, with different meaning for different kinds of child processes.
For a backend process, for example, it's used to pass the CAC_state,
which indicates whether the backend accepts the connection or just sends
"too many clients" error. And for background workers, it's used to pass
the BackgroundWorker struct. The startup data is passed to the child
process in the

ClientSocket is a new struct holds a socket FD, and the local and remote
address info. Before this patch set, postmaster initializes the Port
structs but only fills in those fields in it. With this patch set, we
have a new ClientSocket struct just for those fields, which makes it
more clear which fields are initialized where.

I haven't done much testing yet, and no testing at all on Windows, so
that's probably still broken.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Allocate-Backend-structs-in-PostmasterContext.patchtext/x-patch; charset=UTF-8; name=0001-Allocate-Backend-structs-in-PostmasterContext.patchDownload+19-17
0002-Pass-background-worker-entry-in-the-parameter-file.patchtext/x-patch; charset=UTF-8; name=0002-Pass-background-worker-entry-in-the-parameter-file.patchDownload+25-38
0003-Refactor-CreateSharedMemoryAndSemaphores.patchtext/x-patch; charset=UTF-8; name=0003-Refactor-CreateSharedMemoryAndSemaphores.patchDownload+41-63
0004-Use-FD_CLOEXEC-on-ListenSockets.patchtext/x-patch; charset=UTF-8; name=0004-Use-FD_CLOEXEC-on-ListenSockets.patchDownload+11-10
0005-Move-too-many-clients-already-et-al.-checks-from-Pro.patchtext/x-patch; charset=UTF-8; name=0005-Move-too-many-clients-already-et-al.-checks-from-Pro.patchDownload+43-44
0006-Pass-CAC-as-argument-to-backend-process.patchtext/x-patch; charset=UTF-8; name=0006-Pass-CAC-as-argument-to-backend-process.patchDownload+31-25
0007-Remove-ConnCreate-and-ConnFree-and-allocate-Port-in-.patchtext/x-patch; charset=UTF-8; name=0007-Remove-ConnCreate-and-ConnFree-and-allocate-Port-in-.patchDownload+10-59
0008-Introduce-ClientSocket-rename-some-funcs.patchtext/x-patch; charset=UTF-8; name=0008-Introduce-ClientSocket-rename-some-funcs.patchDownload+147-125
0009-Refactor-postmaster-child-process-launching.patchtext/x-patch; charset=UTF-8; name=0009-Refactor-postmaster-child-process-launching.patchDownload+1799-2006
#2Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#1)
Re: Refactoring backend fork+exec code

From 1d89eec53c7fefa7a4a8c011c9f19e3df64dc436 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Jun 2023 16:33:20 +0300
Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets

@@ -831,7 +834,8 @@ StreamConnection(pgsocket server_fd, Port *port)
void
StreamClose(pgsocket sock)
{
-       closesocket(sock);
+       if (closesocket(sock) != 0)
+               elog(LOG, "closesocket failed: %m");
}

/*

Do you think WARNING would be a more appropriate log level?

From 2f518be9e96cfed1a1a49b4af8f7cb4a837aa784 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Jun 2023 18:07:54 +0300
Subject: [PATCH 5/9] Move "too many clients already" et al. checks from
ProcessStartupPacket.

This seems like a change you could push already (assuming another
maintainer agrees with you), which makes reviews for this patchset even
easier.

From c25b67c045018a2bf05e6ff53819d26e561fc83f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 18 Jun 2023 14:11:16 +0300
Subject: [PATCH 6/9] Pass CAC as argument to backend process.

Could you expand a bit more on why it is better to pass it as a separate
argument? Does it not fit well conceptually in struct Port?

@@ -4498,15 +4510,19 @@ postmaster_forkexec(int argc, char *argv[])
* returns the pid of the fork/exec'd process, or -1 on failure
*/
static pid_t
-backend_forkexec(Port *port)
+backend_forkexec(Port *port, CAC_state cac)
{
-       char       *av[4];
+       char       *av[5];
int                     ac = 0;
+       char            cacbuf[10];

av[ac++] = "postgres";
av[ac++] = "--forkbackend";
av[ac++] = NULL; /* filled in by internal_forkexec */

+       snprintf(cacbuf, sizeof(cacbuf), "%d", (int) cac);
+       av[ac++] = cacbuf;

Might be worth a sanity check that there wasn't any truncation into
cacbuf, which is an impossibility as the code is written, but still
useful for catching a future developer error.

Is it worth adding a command line option at all instead of having the
naked positional argument? It would help anybody who might read the
command line what the seemingly random integer stands for.

@@ -4910,7 +4926,10 @@ SubPostmasterMain(int argc, char *argv[])
/* Run backend or appropriate child */
if (strcmp(argv[1], "--forkbackend") == 0)
{
-               Assert(argc == 3);              /* shouldn't be any more args */
+               CAC_state       cac;
+
+               Assert(argc == 4);
+               cac = (CAC_state) atoi(argv[3]);

Perhaps an assert or full error checking that atoi succeeds would be
useful for similar reasons to my previous comment.

From 658cba5cdb2e5c45faff84566906d2fcaa8a3674 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Jun 2023 18:03:03 +0300
Subject: [PATCH 7/9] Remove ConnCreate and ConnFree, and allocate Port in
stack.

Again, seems like another patch that could be pushed separately assuming
others don't have any comments.

From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 18 Jun 2023 13:56:44 +0300
Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs

@@ -1499,7 +1499,7 @@ CloseServerPorts(int status, Datum arg)
{
if (ListenSocket[i] != PGINVALID_SOCKET)
{
-                       StreamClose(ListenSocket[i]);
+                       closesocket(ListenSocket[i]);
ListenSocket[i] = PGINVALID_SOCKET;
}
}

I see you have been adding log messages in the case of closesocket()
failing. Do you think it is worth doing here as well?

One strange part about this patch is that in patch 4, you edit
StreamClose() to emit a log message in the case of closesocket()
failure, but then this patch just completely removes it.

@@ -4407,11 +4420,11 @@ BackendInitialize(Port *port, CAC_state cac)
*             Doesn't return at all.
*/
static void
-BackendRun(Port *port)
+BackendRun(void)
{
/*
-        * Create a per-backend PGPROC struct in shared memory. We must do
-        * this before we can use LWLocks (in AttachSharedMemoryAndSemaphores).
+        * Create a per-backend PGPROC struct in shared memory. We must do this
+        * before we can use LWLocks (in AttachSharedMemoryAndSemaphores).
*/
InitProcess();

This comment reflow probably fits better in the patch that added the
AttachSharedMemoryAndSemaphores function.

From b33cfeb28a5419045acb659a01410b2b463bea3e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 18 Jun 2023 13:59:48 +0300
Subject: [PATCH 9/9] Refactor postmaster child process launching

- Move code related to launching backend processes to new source file,
process_start.c

Since this seems pretty self-contained, my be easier to review if this
was its own commit.

- Refactor the mechanism of passing informaton from the parent to
child process. Instead of using different command-line arguments
when launching the child process in EXEC_BACKEND mode, pass a
variable-length blob of data along with all the global
variables. The contents of that blob depends on the kind of child
process being launched. In !EXEC_BACKEND mode, we use the same blob,
but it's simply inherited from the parent to child process.

Same with this. Perhaps others would disagree.

+const          PMChildEntry entry_kinds[] = {
+       {"backend", BackendMain, true},
+
+       {"autovacuum launcher", AutoVacLauncherMain, true},
+       {"autovacuum worker", AutoVacWorkerMain, true},
+       {"bgworker", BackgroundWorkerMain, true},
+       {"syslogger", SysLoggerMain, false},
+
+       {"startup", StartupProcessMain, true},
+       {"bgwriter", BackgroundWriterMain, true},
+       {"archiver", PgArchiverMain, true},
+       {"checkpointer", CheckpointerMain, true},
+       {"wal_writer", WalWriterMain, true},
+       {"wal_receiver", WalReceiverMain, true},
+};

It seems like this could be made static. I didn't see it getting exposed
in a header file anywhere, but I also admit that I can be blind at
times.

I need to spend more time looking at this last patch.

Nice work so far!

--
Tristan Partin
Neon (https://neon.tech)

#3Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#1)
Re: Refactoring backend fork+exec code

Hi,

On 2023-06-18 14:22:33 +0300, Heikki Linnakangas wrote:

I started to look at the code in postmaster.c related to launching child
processes. I tried to reduce the difference between EXEC_BACKEND and
!EXEC_BACKEND code paths, and put the code that needs to differ behind a
better abstraction. I started doing this to help with implementing
multi-threading, but it doesn't introduce anything thread-related yet and I
think this improves readability anyway.

Yes please! This code is absolutely awful.

From 0cb6f8d665980d30a5d2a29013000744f16bf813 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 18 Jun 2023 11:00:21 +0300
Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores.

Moves InitProcess calls a little later in EXEC_BACKEND case.

What's the reason for this part? ISTM that we'd really want to get away from
plastering duplicated InitProcess() etc everywhere.

I think this might be easier to understand if you just changed did the
CreateSharedMemoryAndSemaphores() -> AttachSharedMemoryAndSemaphores() piece
in this commit, and the rest later.

+void
+AttachSharedMemoryAndSemaphores(void)
+{
+	/* InitProcess must've been called already */

Perhaps worth an assertion to make it easier to see that the order is wrong?

From 1d89eec53c7fefa7a4a8c011c9f19e3df64dc436 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Jun 2023 16:33:20 +0300
Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets

We went through some effort to close them in the child process. Better to
not hand them down to the child process in the first place.

I think Thomas has a larger version of this patch:
/messages/by-id/CA+hUKGKPNFcfBQduqof4-7C=avjcSfdkKBGvQoRuAvfocnvY0A@mail.gmail.com

From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 18 Jun 2023 13:56:44 +0300
Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs

- Move more of the work on a client socket to the child process.

- Reduce the amount of data that needs to be passed from postmaster to
child. (Used to pass a full Port struct, although most of the fields were
empty. Now we pass the much slimmer ClientSocket.)

I think there might be extensions accessing Port. Not sure if it's worth
worrying about, but ...

--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -476,8 +476,8 @@ AutoVacLauncherMain(int argc, char *argv[])
pqsignal(SIGCHLD, SIG_DFL);
/*
-	 * Create a per-backend PGPROC struct in shared memory. We must do
-	 * this before we can use LWLocks.
+	 * Create a per-backend PGPROC struct in shared memory. We must do this
+	 * before we can use LWLocks.
*/
InitProcess();

Don't think this was intentional?

From b33cfeb28a5419045acb659a01410b2b463bea3e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 18 Jun 2023 13:59:48 +0300
Subject: [PATCH 9/9] Refactor postmaster child process launching

- Move code related to launching backend processes to new source file,
process_start.c

I think you might have renamed this to launch_backend.c?

- Introduce new postmaster_child_launch() function that deals with the
differences between EXEC_BACKEND and fork mode.

- Refactor the mechanism of passing informaton from the parent to
child process. Instead of using different command-line arguments
when launching the child process in EXEC_BACKEND mode, pass a
variable-length blob of data along with all the global
variables. The contents of that blob depends on the kind of child
process being launched. In !EXEC_BACKEND mode, we use the same blob,
but it's simply inherited from the parent to child process.

+const		PMChildEntry entry_kinds[] = {
+	{"backend", BackendMain, true},
+
+	{"autovacuum launcher", AutoVacLauncherMain, true},
+	{"autovacuum worker", AutoVacWorkerMain, true},
+	{"bgworker", BackgroundWorkerMain, true},
+	{"syslogger", SysLoggerMain, false},
+
+	{"startup", StartupProcessMain, true},
+	{"bgwriter", BackgroundWriterMain, true},
+	{"archiver", PgArchiverMain, true},
+	{"checkpointer", CheckpointerMain, true},
+	{"wal_writer", WalWriterMain, true},
+	{"wal_receiver", WalReceiverMain, true},
+};

I'd assign them with the PostmasterChildType as index, so there's no danger of
getting out of order.

const PMChildEntry entry_kinds = {
[PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
...
}

or such should work.

I'd also use designated initializers for the fields, it's otherwise hard to
know what true means etc.

I think it might be good to put more into array. If we e.g. knew whether a
particular child type is a backend-like, and aux process or syslogger, we
could avoid the duplicated InitAuxiliaryProcess(),
MemoryContextDelete(PostmasterContext) etc calls everywhere.

+/*
+ * SubPostmasterMain -- Get the fork/exec'd process into a state equivalent
+ *			to what it would be if we'd simply forked on Unix, and then
+ *			dispatch to the appropriate place.
+ *
+ * The first two command line arguments are expected to be "--forkFOO"
+ * (where FOO indicates which postmaster child we are to become), and
+ * the name of a variables file that we can read to load data that would
+ * have been inherited by fork() on Unix.  Remaining arguments go to the
+ * subprocess FooMain() routine. XXX
+ */
+void
+SubPostmasterMain(int argc, char *argv[])
+{
+	PostmasterChildType child_type;
+	char	   *startup_data;
+	size_t		startup_data_len;
+
+	/* In EXEC_BACKEND case we will not have inherited these settings */
+	IsPostmasterEnvironment = true;
+	whereToSendOutput = DestNone;
+
+	/* Setup essential subsystems (to ensure elog() behaves sanely) */
+	InitializeGUCOptions();
+
+	/* Check we got appropriate args */
+	if (argc < 3)
+		elog(FATAL, "invalid subpostmaster invocation");
+
+	if (strncmp(argv[1], "--forkchild=", 12) == 0)
+	{
+		char	   *entry_name = argv[1] + 12;
+		bool		found = false;
+
+		for (int idx = 0; idx < lengthof(entry_kinds); idx++)
+		{
+			if (strcmp(entry_kinds[idx].name, entry_name) == 0)
+			{
+				child_type = idx;
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			elog(ERROR, "unknown child kind %s", entry_name);
+	}

Hm, shouldn't we error out when called without --forkchild?

+/* Save critical backend variables into the BackendParameters struct */
+#ifndef WIN32
+static bool
+save_backend_variables(BackendParameters *param, ClientSocket *client_sock)
+#else

There's so much of this kind of thing. Could we hide it in a struct or such
instead of needing ifdefs everywhere?

--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -144,6 +144,8 @@ InitShmemAllocation(void)
/*
* Initialize ShmemVariableCache for transaction manager. (This doesn't
* really belong here, but not worth moving.)
+	 *
+	 * XXX: we really should move this
*/
ShmemVariableCache = (VariableCache)
ShmemAlloc(sizeof(*ShmemVariableCache));

Heh. Indeed. And probably just rename it to something less insane.

Greetings,

Andres Freund

#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#3)
Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

Focusing on this one patch in this series:

On 11/07/2023 01:50, Andres Freund wrote:

From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Jun 2023 16:33:20 +0300
Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets

We went through some effort to close them in the child process. Better to
not hand them down to the child process in the first place.

I think Thomas has a larger version of this patch:
/messages/by-id/CA+hUKGKPNFcfBQduqof4-7C=avjcSfdkKBGvQoRuAvfocnvY0A@mail.gmail.com

Hmm, no, that's a little different. Thomas added the FD_CLOEXEC option
to the *accepted* socket in commit 1da569ca1f. That was part of that
thread. This patch adds the option to the *listen* sockets. That was not
discussed in that thread, but it's certainly in the same vein.

Thomas: What do you think of the attached?

On 11/07/2023 00:07, Tristan Partin wrote:

@@ -831,7 +834,8 @@ StreamConnection(pgsocket server_fd, Port *port)
void
StreamClose(pgsocket sock)
{
-       closesocket(sock);
+       if (closesocket(sock) != 0)
+               elog(LOG, "closesocket failed: %m");
}

/*

Do you think WARNING would be a more appropriate log level?

No, WARNING is for messages that you expect the client to receive. This
failure is unexpected at the system level, the message is for the
administrator. The distinction isn't always very clear, but LOG seems
more appropriate in this case.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v2-0001-Use-FD_CLOEXEC-on-ListenSockets.patchtext/x-patch; charset=UTF-8; name=v2-0001-Use-FD_CLOEXEC-on-ListenSockets.patchDownload+11-10
#5Thomas Munro
thomas.munro@gmail.com
In reply to: Heikki Linnakangas (#4)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On Thu, Aug 24, 2023 at 11:41 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 11/07/2023 01:50, Andres Freund wrote:

From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 12 Jun 2023 16:33:20 +0300
Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets

We went through some effort to close them in the child process. Better to
not hand them down to the child process in the first place.

I think Thomas has a larger version of this patch:
/messages/by-id/CA+hUKGKPNFcfBQduqof4-7C=avjcSfdkKBGvQoRuAvfocnvY0A@mail.gmail.com

Hmm, no, that's a little different. Thomas added the FD_CLOEXEC option
to the *accepted* socket in commit 1da569ca1f. That was part of that
thread. This patch adds the option to the *listen* sockets. That was not
discussed in that thread, but it's certainly in the same vein.

Thomas: What do you think of the attached?

LGTM. I vaguely recall thinking that it might be better to keep
EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
didn't try this one, but it looks fine with the comment to explain, as
you have it. (It's a shame we can't use O_CLOFORK.)

There was some question in the other thread about whether doing that
to the server socket might affect accepted sockets too on some OS, but
I can at least confirm that your patch works fine on FreeBSD in an
EXEC_BACKEND build. I think there were some historical disagreements
about which socket properties were inherited, but not that.

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Thomas Munro (#5)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On 24/08/2023 15:48, Thomas Munro wrote:

LGTM. I vaguely recall thinking that it might be better to keep
EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
didn't try this one, but it looks fine with the comment to explain, as
you have it. (It's a shame we can't use O_CLOFORK.)

Yeah, O_CLOFORK would be nice..

Committed, thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)

#7Jeff Janes
jeff.janes@gmail.com
In reply to: Heikki Linnakangas (#6)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On Thu, Aug 24, 2023 at 10:05 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 24/08/2023 15:48, Thomas Munro wrote:

LGTM. I vaguely recall thinking that it might be better to keep
EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
didn't try this one, but it looks fine with the comment to explain, as
you have it. (It's a shame we can't use O_CLOFORK.)

Yeah, O_CLOFORK would be nice..

Committed, thanks!

Since this commit, I'm getting a lot (63 per restart) of messages:

LOG: could not close client or listen socket: Bad file descriptor

All I have to do to get the message is turn logging_collector = on and
restart.

The close failure condition existed before the commit, it just wasn't
logged before. So, did the extra logging added here just uncover a
pre-existing bug?

The LOG message is sent to the terminal, not to the log file.

Cheers,

Jeff

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jeff Janes (#7)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On 28/08/2023 18:55, Jeff Janes wrote:

Since this commit, I'm getting a lot (63 per restart) of messages:

 LOG:  could not close client or listen socket: Bad file descriptor
All I have to do to get the message is turn logging_collector = on and
restart.

The close failure condition existed before the commit, it just wasn't
logged before.  So, did the extra logging added here just uncover a
pre-existing bug?

Yes, so it seems. Syslogger is started before the ListenSockets array is
initialized, so its still all zeros. When syslogger is forked, the child
process tries to close all the listen sockets, which are all zeros. So
syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the
array initialization earlier.

The first close(0) actually does have an effect: it closes stdin, which
is fd 0. That is surely accidental, but I wonder if we should indeed
close stdin in child processes? The postmaster process doesn't do
anything with stdin either, although I guess a library might try to read
a passphrase from stdin before starting up, for example.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

fix-syslogger-closesocket-errors.patchtext/x-patch; charset=UTF-8; name=fix-syslogger-closesocket-errors.patchDownload+5-5
#9Michael Paquier
michael@paquier.xyz
In reply to: Heikki Linnakangas (#8)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On Mon, Aug 28, 2023 at 11:52:15PM +0300, Heikki Linnakangas wrote:

On 28/08/2023 18:55, Jeff Janes wrote:

Since this commit, I'm getting a lot (63 per restart) of messages:

 LOG:  could not close client or listen socket: Bad file descriptor
All I have to do to get the message is turn logging_collector = on and
restart.

The close failure condition existed before the commit, it just wasn't
logged before.  So, did the extra logging added here just uncover a
pre-existing bug?

In case you've not noticed:
/messages/by-id/ZOvvuQe0rdj2slA9@paquier.xyz
But it does not really matter now ;)

Yes, so it seems. Syslogger is started before the ListenSockets array is
initialized, so its still all zeros. When syslogger is forked, the child
process tries to close all the listen sockets, which are all zeros. So
syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the
array initialization earlier.

Yep, I've reached the same conclusion. Wouldn't it be cleaner to move
the callback registration of CloseServerPorts() closer to the array
initialization, though?

The first close(0) actually does have an effect: it closes stdin, which is
fd 0. That is surely accidental, but I wonder if we should indeed close
stdin in child processes? The postmaster process doesn't do anything with
stdin either, although I guess a library might try to read a passphrase from
stdin before starting up, for example.

We would have heard about that, wouldn't we? I may be missing
something of course, but on HEAD, the array initialization is done
before starting any child processes, and the syslogger is the first
one forked.
--
Michael

#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Michael Paquier (#9)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On 29/08/2023 01:28, Michael Paquier wrote:

In case you've not noticed:
/messages/by-id/ZOvvuQe0rdj2slA9@paquier.xyz
But it does not really matter now ;)

Ah sorry, missed that thread.

Yes, so it seems. Syslogger is started before the ListenSockets array is
initialized, so its still all zeros. When syslogger is forked, the child
process tries to close all the listen sockets, which are all zeros. So
syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the
array initialization earlier.

Yep, I've reached the same conclusion. Wouldn't it be cleaner to move
the callback registration of CloseServerPorts() closer to the array
initialization, though?

Ok, pushed that way.

I checked the history of this: it goes back to commit 9a86f03b4e in
version 13. The SysLogger_Start() call used to be later, after setting p
ListenSockets, but that commit moved it. So I backpatched this to v13.

The first close(0) actually does have an effect: it closes stdin, which is
fd 0. That is surely accidental, but I wonder if we should indeed close
stdin in child processes? The postmaster process doesn't do anything with
stdin either, although I guess a library might try to read a passphrase from
stdin before starting up, for example.

We would have heard about that, wouldn't we? I may be missing
something of course, but on HEAD, the array initialization is done
before starting any child processes, and the syslogger is the first
one forked.

Yes, syslogger is the only process that closes stdin. After moving the
initialization, it doesn't close it either.

Thinking about this some more, the ListenSockets array is a bit silly
anyway. We fill the array starting from index 0, always append to the
end, and never remove entries from it. It would seem more
straightforward to keep track of the used size of the array. Currently
we always loop through the unused parts too, and e.g.
ConfigurePostmasterWaitSet() needs to walk the array to count how many
elements are in use.

--
Heikki Linnakangas
Neon (https://neon.tech)

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#10)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On 29/08/2023 09:21, Heikki Linnakangas wrote:

Thinking about this some more, the ListenSockets array is a bit silly
anyway. We fill the array starting from index 0, always append to the
end, and never remove entries from it. It would seem more
straightforward to keep track of the used size of the array. Currently
we always loop through the unused parts too, and e.g.
ConfigurePostmasterWaitSet() needs to walk the array to count how many
elements are in use.

Like this.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

0001-Refactor-ListenSocket-array.patchtext/x-patch; charset=UTF-8; name=0001-Refactor-ListenSocket-array.patchDownload+36-61
#12Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#11)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On 29/08/2023 09:58, Heikki Linnakangas wrote:

On 29/08/2023 09:21, Heikki Linnakangas wrote:

Thinking about this some more, the ListenSockets array is a bit silly
anyway. We fill the array starting from index 0, always append to the
end, and never remove entries from it. It would seem more
straightforward to keep track of the used size of the array. Currently
we always loop through the unused parts too, and e.g.
ConfigurePostmasterWaitSet() needs to walk the array to count how many
elements are in use.

Like this.

This seems pretty uncontroversial, and I heard no objections, so I went
ahead and committed that.

--
Heikki Linnakangas
Neon (https://neon.tech)

#13Michael Paquier
michael@paquier.xyz
In reply to: Heikki Linnakangas (#12)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On Thu, Oct 05, 2023 at 03:08:37PM +0300, Heikki Linnakangas wrote:

This seems pretty uncontroversial, and I heard no objections, so I went
ahead and committed that.

It looks like e29c4643951 is causing issues here. While doing
benchmarking on a cluster compiled with -O2, I got a crash:
LOG: system logger process (PID 27924) was terminated by signal 11: Segmentation fault

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000055ef3b9aed20 in pfree ()
(gdb) bt
#0 0x000055ef3b9aed20 in pfree ()
#1 0x000055ef3b7e0e41 in ClosePostmasterPorts ()
#2 0x000055ef3b7e6649 in SysLogger_Start ()
#3 0x000055ef3b7e4413 in PostmasterMain ()

Okay, the backtrace is not that useful. I'll see if I can get
something better, still it seems like this has broken the way the
syslogger closes these ports.
--
Michael

#14Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#13)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On Fri, Oct 06, 2023 at 02:30:16PM +0900, Michael Paquier wrote:

Okay, the backtrace is not that useful. I'll see if I can get
something better, still it seems like this has broken the way the
syslogger closes these ports.

And here you go:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 196 header =
*((const uint64 *) ((const char *) pointer - sizeof(uint64)));
(gdb) bt
#0 GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196
#1 0x0000557d04176d59 in pfree (pointer=0x0) at mcxt.c:1463
#2 0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at postmaster.c:2571
#3 0x0000557d03e93ac2 in SysLogger_Start () at syslogger.c:686
#4 0x0000557d03e8c5b7 in PostmasterMain (argc=3, argv=0x557d0471ed00)
at postmaster.c:1148
#5 0x0000557d03d48e34 in main (argc=3, argv=0x557d0471ed00) at main.c:198
(gdb) up 2
#2 0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at
postmaster.c:2571
2571 pfree(ListenSockets);
(gdb) p ListenSockets $1 = (pgsocket *) 0x0
--
Michael

#15Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Michael Paquier (#14)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On 06/10/2023 09:50, Michael Paquier wrote:

On Fri, Oct 06, 2023 at 02:30:16PM +0900, Michael Paquier wrote:

Okay, the backtrace is not that useful. I'll see if I can get
something better, still it seems like this has broken the way the
syslogger closes these ports.

And here you go:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 196 header =
*((const uint64 *) ((const char *) pointer - sizeof(uint64)));
(gdb) bt
#0 GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196
#1 0x0000557d04176d59 in pfree (pointer=0x0) at mcxt.c:1463
#2 0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at postmaster.c:2571
#3 0x0000557d03e93ac2 in SysLogger_Start () at syslogger.c:686
#4 0x0000557d03e8c5b7 in PostmasterMain (argc=3, argv=0x557d0471ed00)
at postmaster.c:1148
#5 0x0000557d03d48e34 in main (argc=3, argv=0x557d0471ed00) at main.c:198
(gdb) up 2
#2 0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at
postmaster.c:2571
2571 pfree(ListenSockets);
(gdb) p ListenSockets $1 = (pgsocket *) 0x0

Fixed, thanks!

I did a quick test with syslogger enabled before committing, but didn't
notice the segfault. I missed it because syslogger gets restarted and
then it worked.

--
Heikki Linnakangas
Neon (https://neon.tech)

#16Michael Paquier
michael@paquier.xyz
In reply to: Heikki Linnakangas (#15)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

On Fri, Oct 06, 2023 at 10:27:22AM +0300, Heikki Linnakangas wrote:

I did a quick test with syslogger enabled before committing, but didn't
notice the segfault. I missed it because syslogger gets restarted and then
it worked.

Thanks, Heikki.
--
Michael

#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#3)
Re: Refactoring backend fork+exec code

I updated this patch set, addressing some of the straightforward
comments from Tristan and Andres, and did some more cleanups, commenting
etc. Works on Windows now.

Replies to some of the individual comments below:

On 11/07/2023 00:07, Tristan Partin wrote:

@@ -4498,15 +4510,19 @@ postmaster_forkexec(int argc, char *argv[])
* returns the pid of the fork/exec'd process, or -1 on failure
*/
static pid_t
-backend_forkexec(Port *port)
+backend_forkexec(Port *port, CAC_state cac)
{
-       char       *av[4];
+       char       *av[5];
int                     ac = 0;
+       char            cacbuf[10];

av[ac++] = "postgres";
av[ac++] = "--forkbackend";
av[ac++] = NULL; /* filled in by internal_forkexec */

+       snprintf(cacbuf, sizeof(cacbuf), "%d", (int) cac);
+       av[ac++] = cacbuf;

Might be worth a sanity check that there wasn't any truncation into
cacbuf, which is an impossibility as the code is written, but still
useful for catching a future developer error.

Is it worth adding a command line option at all instead of having the
naked positional argument? It would help anybody who might read the
command line what the seemingly random integer stands for.

+1. This gets refactored away in the last patch though. In the last
patch, I used a child process name instead of an integer precisely
because it looks nicer in "ps".

I wonder if we should add more command line arguments, just for
informational purposes. Autovacuum worker process could display the
database name it's connected to, for example. I don't know how important
the command line is on Windows, is it displayed by tools that people
care about?

On 11/07/2023 01:50, Andres Freund wrote:

On 2023-06-18 14:22:33 +0300, Heikki Linnakangas wrote:

From 0cb6f8d665980d30a5d2a29013000744f16bf813 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 18 Jun 2023 11:00:21 +0300
Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores.

Moves InitProcess calls a little later in EXEC_BACKEND case.

What's the reason for this part?

The point is that with this commit, InitProcess() is called at same
place in EXEC_BACKEND mode and !EXEC_BACKEND. It feels more consistent
that way.

ISTM that we'd really want to get away from plastering duplicated
InitProcess() etc everywhere.

Sure, we could do more to reduce the duplication. I think this is a step
in the right direction, though.

From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 18 Jun 2023 13:56:44 +0300
Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs

- Move more of the work on a client socket to the child process.

- Reduce the amount of data that needs to be passed from postmaster to
child. (Used to pass a full Port struct, although most of the fields were
empty. Now we pass the much slimmer ClientSocket.)

I think there might be extensions accessing Port. Not sure if it's worth
worrying about, but ...

That's OK. Port still exists, it's just created a little later. It will
be initialized by the time extensions might look at it.

+const		PMChildEntry entry_kinds[] = {
+	{"backend", BackendMain, true},
+
+	{"autovacuum launcher", AutoVacLauncherMain, true},
+	{"autovacuum worker", AutoVacWorkerMain, true},
+	{"bgworker", BackgroundWorkerMain, true},
+	{"syslogger", SysLoggerMain, false},
+
+	{"startup", StartupProcessMain, true},
+	{"bgwriter", BackgroundWriterMain, true},
+	{"archiver", PgArchiverMain, true},
+	{"checkpointer", CheckpointerMain, true},
+	{"wal_writer", WalWriterMain, true},
+	{"wal_receiver", WalReceiverMain, true},
+};

I'd assign them with the PostmasterChildType as index, so there's no danger of
getting out of order.

const PMChildEntry entry_kinds = {
[PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
...
}

or such should work.

Nice, I didn't know about that syntax! Changed it that way.

I'd also use designated initializers for the fields, it's otherwise hard to
know what true means etc.

I think with one boolean and the struct declaration nearby, it's fine.
If this becomes more complex in the future, with more fields, I agree.

I think it might be good to put more into array. If we e.g. knew whether a
particular child type is a backend-like, and aux process or syslogger, we
could avoid the duplicated InitAuxiliaryProcess(),
MemoryContextDelete(PostmasterContext) etc calls everywhere.

I agree we could do more refactoring here. I don't agree with adding
more to this struct though. I'm trying to limit the code in
launch_backend.c to hiding the differences between EXEC_BACKEND and
!EXEC_BACKEND. In EXEC_BACKEND mode, it restores the child process to
the same state as it is after fork() in !EXEC_BACKEND mode. Any other
initialization steps belong elsewhere.

Some of the steps between InitPostmasterChild() and the *Main()
functions could probably be moved around and refactored. I didn't think
hard about that. I think that can be done separately as follow-up patch.

+/* Save critical backend variables into the BackendParameters struct */
+#ifndef WIN32
+static bool
+save_backend_variables(BackendParameters *param, ClientSocket *client_sock)
+#else

There's so much of this kind of thing. Could we hide it in a struct or such
instead of needing ifdefs everywhere?

A lot of #ifdefs you mean? I agree launch_backend.c has a lot of those.
I haven't come up with any good ideas on reducing them, unfortunately.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v2-0001-Pass-background-worker-entry-in-the-parameter-fil.patchtext/x-patch; charset=UTF-8; name=v2-0001-Pass-background-worker-entry-in-the-parameter-fil.patchDownload+26-39
v2-0002-Refactor-CreateSharedMemoryAndSemaphores.patchtext/x-patch; charset=UTF-8; name=v2-0002-Refactor-CreateSharedMemoryAndSemaphores.patchDownload+44-64
v2-0003-Pass-CAC-as-argument-to-backend-process.patchtext/x-patch; charset=UTF-8; name=v2-0003-Pass-CAC-as-argument-to-backend-process.patchDownload+31-25
v2-0004-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patchtext/x-patch; charset=UTF-8; name=v2-0004-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patchDownload+11-60
v2-0005-Introduce-ClientSocket-rename-some-funcs.patchtext/x-patch; charset=UTF-8; name=v2-0005-Introduce-ClientSocket-rename-some-funcs.patchDownload+149-127
v2-0006-Refactor-postmaster-child-process-launching.patchtext/x-patch; charset=UTF-8; name=v2-0006-Refactor-postmaster-child-process-launching.patchDownload+1789-2008
#18Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#17)
Re: Refactoring backend fork+exec code

On 11/10/2023 14:12, Heikki Linnakangas wrote:

On 11/07/2023 01:50, Andres Freund wrote:

Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores.

Moves InitProcess calls a little later in EXEC_BACKEND case.

What's the reason for this part?

The point is that with this commit, InitProcess() is called at same
place in EXEC_BACKEND mode and !EXEC_BACKEND. It feels more consistent
that way.

ISTM that we'd really want to get away from plastering duplicated
InitProcess() etc everywhere.

Sure, we could do more to reduce the duplication. I think this is a step
in the right direction, though.

Here's another rebased patch set. Compared to previous version, I did a
little more refactoring around CreateSharedMemoryAndSemaphores and
InitProcess:

- patch 1 splits CreateSharedMemoryAndSemaphores into two functions:
CreateSharedMemoryAndSemaphores is now only called at postmaster
startup, and a new function called AttachSharedMemoryStructs() is called
in backends in EXEC_BACKEND mode. I extracted the common parts of those
functions to a new static function. (Some of this refactoring used to be
part of the 3rd patch in the series, but it seems useful on its own, so
I split it out.)

- patch 3 moves the call to AttachSharedMemoryStructs() to
InitProcess(), reducing the boilerplate code a little.

The patches are incrementally useful, so if you don't have time to
review all of them, a review on a subset would be useful too.

--
Heikki Linnakangas
Neon (https://neon.tech)

Attachments:

v3-0001-Refactor-CreateSharedMemoryAndSemaphores.patchtext/x-patch; charset=UTF-8; name=v3-0001-Refactor-CreateSharedMemoryAndSemaphores.patchDownload+117-90
v3-0002-Pass-BackgroundWorker-entry-in-the-parameter-file.patchtext/x-patch; charset=UTF-8; name=v3-0002-Pass-BackgroundWorker-entry-in-the-parameter-file.patchDownload+25-38
v3-0003-Refactor-how-InitProcess-is-called.patchtext/x-patch; charset=UTF-8; name=v3-0003-Refactor-how-InitProcess-is-called.patchDownload+25-62
v3-0004-Pass-CAC-as-argument-to-backend-process.patchtext/x-patch; charset=UTF-8; name=v3-0004-Pass-CAC-as-argument-to-backend-process.patchDownload+31-25
v3-0005-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patchtext/x-patch; charset=UTF-8; name=v3-0005-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patchDownload+11-60
v3-0006-Introduce-ClientSocket-rename-some-funcs.patchtext/x-patch; charset=UTF-8; name=v3-0006-Introduce-ClientSocket-rename-some-funcs.patchDownload+149-127
v3-0007-Refactor-postmaster-child-process-launching.patchtext/x-patch; charset=UTF-8; name=v3-0007-Refactor-postmaster-child-process-launching.patchDownload+1787-2006
#19Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#18)
Re: Refactoring backend fork+exec code

On Wed Nov 29, 2023 at 5:36 PM CST, Heikki Linnakangas wrote:

On 11/10/2023 14:12, Heikki Linnakangas wrote:
Here's another rebased patch set. Compared to previous version, I did a
little more refactoring around CreateSharedMemoryAndSemaphores and
InitProcess:

- patch 1 splits CreateSharedMemoryAndSemaphores into two functions:
CreateSharedMemoryAndSemaphores is now only called at postmaster
startup, and a new function called AttachSharedMemoryStructs() is called
in backends in EXEC_BACKEND mode. I extracted the common parts of those
functions to a new static function. (Some of this refactoring used to be
part of the 3rd patch in the series, but it seems useful on its own, so
I split it out.)

- patch 3 moves the call to AttachSharedMemoryStructs() to
InitProcess(), reducing the boilerplate code a little.

The patches are incrementally useful, so if you don't have time to
review all of them, a review on a subset would be useful too.

From 8886db1ed6bae21bf6d77c9bb1230edbb55e24f9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 30 Nov 2023 00:04:22 +0200
Subject: [PATCH v3 4/7] Pass CAC as argument to backend process

For me, being new to the code, it would be nice to have more of an
explanation as to why this is "better." I don't doubt it; it would just
help me and future readers of this commit in the future. More of an
explanation in the commit message would suffice.

My other comment on this commit is that we now seem to have lost the
context on what CAC stands for. Before we had the member variable to
explain it. A comment on the enum would be great or changing cac named
variables to canAcceptConnections. I did notice in patch 7 that there
are still some variables named canAcceptConnections around, so I'll
leave this comment up to you.

From 98f8397b32a0b36e221475b32697c9c5bbca86a0 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 11 Oct 2023 13:38:06 +0300
Subject: [PATCH v3 5/7] Remove ConnCreate and ConnFree, and allocate Port in
stack

I like it separate.

From 79aab42705a8cb0e16e61c33052fc56fdd4fca76 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 11 Oct 2023 13:38:10 +0300
Subject: [PATCH v3 6/7] Introduce ClientSocket, rename some funcs

+static int BackendStartup(ClientSocket *port);

s/port/client_sock

-                port->remote_hostname = strdup(remote_host);
+                port->remote_hostname = pstrdup(remote_host);
+        MemoryContextSwitchTo(oldcontext);

Something funky with the whitespace here, but my eyes might also be
playing tricks on me. Mixing spaces in tabs like what do in this
codebase makes it difficult to review :).

From ce51876f87f1e4317e25baf64184749448fcd033 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 30 Nov 2023 00:07:34 +0200
Subject: [PATCH v3 7/7] Refactor postmaster child process launching

+                entry_kinds[child_type].main_fn(startup_data, startup_data_len);
+                Assert(false);

Seems like you want the pg_unreachable() macro here instead of
Assert(false). Similar comment at the end of SubPostmasterMain().

+        if (fwrite(param, paramsz, 1, fp) != 1)
+        {
+                ereport(LOG,
+                                (errcode_for_file_access(),
+                                 errmsg("could not write to file \"%s\": %m", tmpfilename)));
+                FreeFile(fp);
+                return -1;
+        }
+
+        /* Release file */
+        if (FreeFile(fp))
+        {
+                ereport(LOG,
+                                (errcode_for_file_access(),
+                                 errmsg("could not write to file \"%s\": %m", tmpfilename)));
+                return -1;
+        }

Two pieces of feedback here. I generally find write(2) more useful than
fwrite(3) because write(2) will report a useful errno, whereas fwrite(2)
just uses ferror(3). The additional errno information might be valuable
context in the log message. Up to you if you think it is also valuable.

The log message if FreeFile() fails doesn't seem to make sense to me.
I didn't see any file writing in that code path, but it is possible that
I missed something.

+        /*
+         * Set reference point for stack-depth checking.  This might seem
+         * redundant in !EXEC_BACKEND builds; but it's not because the postmaster
+         * launches its children from signal handlers, so we might be running on
+         * an alternative stack. XXX still true?
+         */
+        (void) set_stack_base();

Looks like there is still this XXX left. Can't say I completely
understand the second sentence either.

+        /*
+         * make sure stderr is in binary mode before anything can possibly be
+         * written to it, in case it's actually the syslogger pipe, so the pipe
+         * chunking protocol isn't disturbed. Non-logpipe data gets translated on
+         * redirection (e.g. via pg_ctl -l) anyway.
+         */

Nit: The 'm' in the first "make" should be capitalized.

+        if (fread(&param, sizeof(param), 1, fp) != 1)
+        {
+                write_stderr("could not read from backend variables file \"%s\": %s\n",
+                                         id, strerror(errno));
+                exit(1);
+        }
+
+        /* read startup data */
+        *startup_data_len = param.startup_data_len;
+        if (param.startup_data_len > 0)
+        {
+                *startup_data = palloc(*startup_data_len);
+                if (fread(*startup_data, *startup_data_len, 1, fp) != 1)
+                {
+                        write_stderr("could not read startup data from backend variables file \"%s\": %s\n",
+                                                 id, strerror(errno));
+                        exit(1);
+                }
+        }

fread(3) doesn't set errno. I would probably switch these to read(2) for
the reason I wrote in a previous comment.

+        /*
+         * Need to reinitialize the SSL library in the backend, since the context
+         * structures contain function pointers and cannot be passed through the
+         * parameter file.
+         *
+         * If for some reason reload fails (maybe the user installed broken key
+         * files), soldier on without SSL; that's better than all connections
+         * becoming impossible.
+         *
+         * XXX should we do this in all child processes?  For the moment it's
+         * enough to do it in backend children.
+         */
+#ifdef USE_SSL
+        if (EnableSSL)
+        {
+                if (secure_initialize(false) == 0)
+                        LoadedSSL = true;
+                else
+                        ereport(LOG,
+                                        (errmsg("SSL configuration could not be loaded in child process")));
+        }
+#endif

Do other child process types do any non-local communication?

-typedef struct ClientSocket {
+struct ClientSocket
+{
pgsocket        sock;                        /* File descriptor */
SockAddr        laddr;                        /* local addr (postmaster) */
SockAddr        raddr;                        /* remote addr (client) */
-} ClientSocket;
+};
+typedef struct ClientSocket ClientSocket;

Can't say I completely understand the reason for this change given it
was added in your series.

I didn't look too hard at the Windows-specific code, so maybe someone
who knows Windows will have something to say, but it also might've just
been copy-paste that I didn't realize.

There were a few more XXXs that probably should be figured out before
committing. Though perhaps some of them were already there.

Patches 1-3 seem committable as-is. I only had minor comments on
everything but 7, so after taking a look at those, they could be
committed.

Overall, this seems liked a marked improvement :).

--
Tristan Partin
Neon (https://neon.tech)

#20Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#18)
Re: Refactoring backend fork+exec code

Hi,

On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote:

- patch 1 splits CreateSharedMemoryAndSemaphores into two functions:
CreateSharedMemoryAndSemaphores is now only called at postmaster startup,
and a new function called AttachSharedMemoryStructs() is called in backends
in EXEC_BACKEND mode. I extracted the common parts of those functions to a
new static function. (Some of this refactoring used to be part of the 3rd
patch in the series, but it seems useful on its own, so I split it out.)

I like that idea.

From a96b6e92fdeaa947bf32774c425419b8f987b8e2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 30 Nov 2023 00:01:25 +0200
Subject: [PATCH v3 1/7] Refactor CreateSharedMemoryAndSemaphores

For clarity, have separate functions for *creating* the shared memory
and semaphores at postmaster or single-user backend startup, and
for *attaching* to existing shared memory structures in EXEC_BACKEND
case. CreateSharedMemoryAndSemaphores() is now called only at
postmaster startup, and a new AttachSharedMemoryStructs() function is
called at backend startup in EXEC_BACKEND mode.

I assume CreateSharedMemoryAndSemaphores() is still called during crash
restart? I wonder if it shouldn't split three ways:
1) create
2) initialize
3) attach

From 3478cafcf74a5c8d649e0287e6c72669a29c0e70 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 30 Nov 2023 00:02:03 +0200
Subject: [PATCH v3 2/7] Pass BackgroundWorker entry in the parameter file in
EXEC_BACKEND mode

This makes it possible to move InitProcess later in SubPostmasterMain
(in next commit), as we no longer need to access shared memory to read
background worker entry.

static void read_backend_variables(char *id, Port *port);
@@ -4831,7 +4833,7 @@ SubPostmasterMain(int argc, char *argv[])
strcmp(argv[1], "--forkavlauncher") == 0 ||
strcmp(argv[1], "--forkavworker") == 0 ||
strcmp(argv[1], "--forkaux") == 0 ||
-		strncmp(argv[1], "--forkbgworker=", 15) == 0)
+		strncmp(argv[1], "--forkbgworker", 14) == 0)
PGSharedMemoryReAttach();
else
PGSharedMemoryNoReAttach();
@@ -4962,10 +4964,8 @@ SubPostmasterMain(int argc, char *argv[])
AutoVacWorkerMain(argc - 2, argv + 2);	/* does not return */
}
-	if (strncmp(argv[1], "--forkbgworker=", 15) == 0)
+	if (strncmp(argv[1], "--forkbgworker", 14) == 0)

Now that we don't need to look at parameters anymore, these should probably be
just a strcmp(), like the other cases?

From 0d071474e12a70ff8113c7b0731c5b97fec45007 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 29 Nov 2023 23:47:25 +0200
Subject: [PATCH v3 3/7] Refactor how InitProcess is called

The order of process initialization steps is now more consistent
between !EXEC_BACKEND and EXEC_BACKEND modes. InitProcess() is called
at the same place in either mode. We can now also move the
AttachSharedMemoryStructs() call into InitProcess() itself. This
reduces the number of "#ifdef EXEC_BACKEND" blocks.

Yay.

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index cdfdd6fbe1d..6c708777dde 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -461,6 +461,12 @@ InitProcess(void)
*/
InitLWLockAccess();
InitDeadLockChecking();
+
+#ifdef EXEC_BACKEND
+	/* Attach process to shared data structures */
+	if (IsUnderPostmaster)
+		AttachSharedMemoryStructs();
+#endif
}
/*
@@ -614,6 +620,12 @@ InitAuxiliaryProcess(void)
* Arrange to clean up at process exit.
*/
on_shmem_exit(AuxiliaryProcKill, Int32GetDatum(proctype));
+
+#ifdef EXEC_BACKEND
+	/* Attach process to shared data structures */
+	if (IsUnderPostmaster)
+		AttachSharedMemoryStructs();
+#endif
}

Aside: Somewhat odd that InitAuxiliaryProcess() doesn't call
InitLWLockAccess().

I think a short comment explaining why we can attach to shmem structs after
already accessing shared memory earlier in the function would be worthwhile.

From ce51876f87f1e4317e25baf64184749448fcd033 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 30 Nov 2023 00:07:34 +0200
Subject: [PATCH v3 7/7] Refactor postmaster child process launching

- Move code related to launching backend processes to new source file,
launch_backend.c

- Introduce new postmaster_child_launch() function that deals with the
differences between EXEC_BACKEND and fork mode.

- Refactor the mechanism of passing information from the parent to
child process. Instead of using different command-line arguments when
launching the child process in EXEC_BACKEND mode, pass a
variable-length blob of data along with all the global variables. The
contents of that blob depend on the kind of child process being
launched. In !EXEC_BACKEND mode, we use the same blob, but it's simply
inherited from the parent to child process.

[...]
33 files changed, 1787 insertions(+), 2002 deletions(-)

Well, that's not small...

I think it may be worth splitting some of the file renaming out into a
separate commit, makes it harder to see what changed here.

+AutoVacLauncherMain(char *startup_data, size_t startup_data_len)
{
-	pid_t		AutoVacPID;
+	sigjmp_buf	local_sigjmp_buf;
-#ifdef EXEC_BACKEND
-	switch ((AutoVacPID = avlauncher_forkexec()))
-#else
-	switch ((AutoVacPID = fork_process()))
-#endif
+	/* Release postmaster's working memory context */
+	if (PostmasterContext)
{
-		case -1:
-			ereport(LOG,
-					(errmsg("could not fork autovacuum launcher process: %m")));
-			return 0;
-
-#ifndef EXEC_BACKEND
-		case 0:
-			/* in postmaster child ... */
-			InitPostmasterChild();
-
-			/* Close the postmaster's sockets */
-			ClosePostmasterPorts(false);
-
-			AutoVacLauncherMain(0, NULL);
-			break;
-#endif
-		default:
-			return (int) AutoVacPID;
+		MemoryContextDelete(PostmasterContext);
+		PostmasterContext = NULL;
}

- /* shouldn't get here */
- return 0;
-}

This if (PostmasterContext) ... else "shouldn't get here" business seems
pretty silly, more likely to hide problems than to help.

+/*
+ * Information needed to launch different kinds of child processes.
+ */
+static const struct
+{
+	const char *name;
+	void		(*main_fn) (char *startup_data, size_t startup_data_len);
+	bool		shmem_attach;
+}			entry_kinds[] = {
+	[PMC_BACKEND] = {"backend", BackendMain, true},

Personally I'd give the struct an actual name - makes the debugging experience
a bit nicer than anonymous structs that you can't even reference by a typedef.

+	[PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
+	[PMC_AV_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},
+	[PMC_BGWORKER] = {"bgworker", BackgroundWorkerMain, true},
+	[PMC_SYSLOGGER] = {"syslogger", SysLoggerMain, false},
+
+	[PMC_STARTUP] = {"startup", StartupProcessMain, true},
+	[PMC_BGWRITER] = {"bgwriter", BackgroundWriterMain, true},
+	[PMC_ARCHIVER] = {"archiver", PgArchiverMain, true},
+	[PMC_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true},
+	[PMC_WAL_WRITER] = {"wal_writer", WalWriterMain, true},
+	[PMC_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true},
+};

It feels like we have too many different ways of documenting the type of a
process. This new PMC_ stuff, enum AuxProcType, enum BackendType. Which then
leads to code like this:

-CheckpointerMain(void)
+CheckpointerMain(char *startup_data, size_t startup_data_len)
{
sigjmp_buf	local_sigjmp_buf;
MemoryContext checkpointer_context;
+	Assert(startup_data_len == 0);
+
+	MyAuxProcType = CheckpointerProcess;
+	MyBackendType = B_CHECKPOINTER;
+	AuxiliaryProcessInit();
+

For each type of child process. That seems a bit too redundant. Can't we
unify this at least somewhat? Can't we just reuse BackendType here? Sure,
there'd be pointless entry for B_INVALID, but that doesn't seem like a
problem, could even be useful, by pointing it to a function raising an error.

At the very least this shouldn't deviate from the naming pattern of
BackendType.

+/*
+ * SubPostmasterMain -- Get the fork/exec'd process into a state equivalent
+ *			to what it would be if we'd simply forked on Unix, and then
+ *			dispatch to the appropriate place.
+ *
+ * The first two command line arguments are expected to be "--forkchild=<name>",
+ * where <name> indicates which postmaster child we are to become, and
+ * the name of a variables file that we can read to load data that would
+ * have been inherited by fork() on Unix.
+ */
+void
+SubPostmasterMain(int argc, char *argv[])
+{
+	PostmasterChildType child_type;
+	char	   *startup_data;
+	size_t		startup_data_len;
+	char	   *entry_name;
+	bool		found = false;
+
+	/* In EXEC_BACKEND case we will not have inherited these settings */
+	IsPostmasterEnvironment = true;
+	whereToSendOutput = DestNone;
+
+	/* Setup essential subsystems (to ensure elog() behaves sanely) */
+	InitializeGUCOptions();
+
+	/* Check we got appropriate args */
+	if (argc != 3)
+		elog(FATAL, "invalid subpostmaster invocation");
+
+	if (strncmp(argv[1], "--forkchild=", 12) != 0)
+		elog(FATAL, "invalid subpostmaster invocation (--forkchild argument missing)");
+	entry_name = argv[1] + 12;
+	found = false;
+	for (int idx = 0; idx < lengthof(entry_kinds); idx++)
+	{
+		if (strcmp(entry_kinds[idx].name, entry_name) == 0)
+		{
+			child_type = idx;
+			found = true;
+			break;
+		}
+	}
+	if (!found)
+		elog(ERROR, "unknown child kind %s", entry_name);

If we then have to search linearly, why don't we just pass the index into the
array?

-#define StartupDataBase()		StartChildProcess(StartupProcess)
-#define StartArchiver()			StartChildProcess(ArchiverProcess)
-#define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
-#define StartCheckpointer()		StartChildProcess(CheckpointerProcess)
-#define StartWalWriter()		StartChildProcess(WalWriterProcess)
-#define StartWalReceiver()		StartChildProcess(WalReceiverProcess)
+#define StartupDataBase()		StartChildProcess(PMC_STARTUP)
+#define StartArchiver()			StartChildProcess(PMC_ARCHIVER)
+#define StartBackgroundWriter() StartChildProcess(PMC_BGWRITER)
+#define StartCheckpointer()		StartChildProcess(PMC_CHECKPOINTER)
+#define StartWalWriter()		StartChildProcess(PMC_WAL_WRITER)
+#define StartWalReceiver()		StartChildProcess(PMC_WAL_RECEIVER)
+
+#define StartAutoVacLauncher()	StartChildProcess(PMC_AV_LAUNCHER);
+#define StartAutoVacWorker()	StartChildProcess(PMC_AV_WORKER);

Obviously not your fault, but these macros are so pointless... Making it
harder to find where we start child processes, all to save a a few characters
in one place, while adding considerably more in others.

+void
+BackendMain(char *startup_data, size_t startup_data_len)
+{

Is there any remaining reason for this to live in postmaster.c? Given that
other backend types don't, that seems oddly assymmetrical.

Greetings,

Andres Freund

#21Andres Freund
andres@anarazel.de
In reply to: Tristan Partin (#19)
#22Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#21)
#23Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#20)
#24Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#20)
#25Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#24)
#26Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tristan Partin (#19)
#27Alexander Lakhin
exclusion@gmail.com
In reply to: Heikki Linnakangas (#26)
#28Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#26)
#29Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Lakhin (#27)
#30Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tristan Partin (#30)
#32Alexander Lakhin
exclusion@gmail.com
In reply to: Heikki Linnakangas (#29)
#33Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Lakhin (#32)
#34Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#20)
#35Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#34)
#36Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#35)
#37Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#36)
#38Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#37)
#39Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#38)
#40Reid Thompson
reid.thompson@crunchydata.com
In reply to: Heikki Linnakangas (#39)
#41Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Reid Thompson (#40)
#42Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#41)
#43Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#41)
#44Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#43)
#45Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#44)
#46Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#45)
#47Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#46)
#48Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#47)
#49Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#48)
#50Richard Guo
guofenglinux@gmail.com
In reply to: Heikki Linnakangas (#48)
#51Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Richard Guo (#50)
#52Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#49)
#53Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tristan Partin (#52)
#54Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#53)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#54)
#56Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#55)
#57Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Heikki Linnakangas (#56)
#58Anton A. Melnikov
a.melnikov@postgrespro.ru
In reply to: Jelte Fennema-Nio (#57)
#59Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Anton A. Melnikov (#58)
#60Anton A. Melnikov
a.melnikov@postgrespro.ru
In reply to: Heikki Linnakangas (#59)
#61Thomas Munro
thomas.munro@gmail.com
In reply to: Heikki Linnakangas (#54)
#62Nathan Bossart
nathandbossart@gmail.com
In reply to: Thomas Munro (#61)
#63Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Thomas Munro (#61)