Remove pthread_is_threaded_np() checks in postmaster

Started by Tristan Partinalmost 2 years ago8 messages
#1Tristan Partin
tristan@neon.tech
2 attachment(s)

These checks are not effective for what they are trying to prevent.
A recent commit[0]https://github.com/curl/curl/commit/8b7cbe9decc205b08ec8258eb184c89a33e3084b in libcurl when used on macOS has been tripping the
pthread_is_threaded_np() check in postmaster.c for
shared_preload_libraries entries which use libcurl (like Neon). Under
the hood, libcurl calls SCDynamicStoreCopyProxies[1]https://developer.apple.com/documentation/systemconfiguration/1517088-scdynamicstorecopyproxies, which apparently
causes the check to fail.

Attached is a patch to remove the check, and a minimal reproducer
(curlexe.zip) for others to run on macOS.

Here are some logs:

Postgres working as expected:

$ LC_ALL="C" /opt/homebrew/opt/postgresql@16/bin/postgres -D /opt/homebrew/var/postgresql@16
2024-01-22 23:18:51.203 GMT [50388] LOG: starting PostgreSQL 16.1 (Homebrew) on aarch64-apple-darwin23.2.0, compiled by Apple clang version 15.0.0 (clang-1500.1.0.2.5), 64-bit
2024-01-22 23:18:51.204 GMT [50388] LOG: listening on IPv6 address "::1", port 5432
2024-01-22 23:18:51.204 GMT [50388] LOG: listening on IPv4 address "127.0.0.1", port 5432
2024-01-22 23:18:51.205 GMT [50388] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-01-22 23:18:51.207 GMT [50391] LOG: database system was shut down at 2023-12-21 23:12:10 GMT
2024-01-22 23:18:51.211 GMT [50388] LOG: database system is ready to accept connections
^C2024-01-22 23:18:53.797 GMT [50388] LOG: received fast shutdown request
2024-01-22 23:18:53.798 GMT [50388] LOG: aborting any active transactions
2024-01-22 23:18:53.800 GMT [50388] LOG: background worker "logical replication launcher" (PID 50394) exited with exit code 1
2024-01-22 23:18:53.801 GMT [50389] LOG: shutting down
2024-01-22 23:18:53.801 GMT [50389] LOG: checkpoint starting: shutdown immediate
2024-01-22 23:18:53.805 GMT [50389] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.005 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/4BE77E0, redo lsn=0/4BE77E0
2024-01-22 23:18:53.809 GMT [50388] LOG: database system is shut down

Postgres not working with attached extension preloaded:

$ echo shared_preload_libraries=curlexe >> /opt/homebrew/var/postgresql@16/postgresql.conf
$ LC_ALL="C" /opt/homebrew/opt/postgresql@16/bin/postgres -D /opt/homebrew/var/postgresql@16
2024-01-22 23:19:01.108 GMT [50395] LOG: starting PostgreSQL 16.1 (Homebrew) on aarch64-apple-darwin23.2.0, compiled by Apple clang version 15.0.0 (clang-1500.1.0.2.5), 64-bit
2024-01-22 23:19:01.110 GMT [50395] LOG: listening on IPv6 address "::1", port 5432
2024-01-22 23:19:01.110 GMT [50395] LOG: listening on IPv4 address "127.0.0.1", port 5432
2024-01-22 23:19:01.111 GMT [50395] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-01-22 23:19:01.113 GMT [50395] FATAL: postmaster became multithreaded during startup
2024-01-22 23:19:01.113 GMT [50395] HINT: Set the LC_ALL environment variable to a valid locale.
2024-01-22 23:19:01.114 GMT [50395] LOG: database system is shut down

Same as previous, but without IPv6 support in libcurl:

$ DYLD_LIBRARY_PATH=/opt/homebrew/opt/curl-without-ipv6/lib LC_ALL="C" /opt/homebrew/opt/postgresql@16/bin/postgres -D /opt/homebrew/var/postgresql@16
2024-01-22 23:23:17.151 GMT [50546] LOG: starting PostgreSQL 16.1 (Homebrew) on aarch64-apple-darwin23.2.0, compiled by Apple clang version 15.0.0 (clang-1500.1.0.2.5), 64-bit
2024-01-22 23:23:17.152 GMT [50546] LOG: listening on IPv6 address "::1", port 5432
2024-01-22 23:23:17.152 GMT [50546] LOG: listening on IPv4 address "127.0.0.1", port 5432
2024-01-22 23:23:17.152 GMT [50546] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-01-22 23:23:17.155 GMT [50549] LOG: database system was shut down at 2024-01-22 23:23:10 GMT
2024-01-22 23:23:17.158 GMT [50546] LOG: database system is ready to accept connections
^C2024-01-22 23:23:26.997 GMT [50546] LOG: received fast shutdown request
2024-01-22 23:23:26.998 GMT [50546] LOG: aborting any active transactions
2024-01-22 23:23:27.000 GMT [50546] LOG: background worker "logical replication launcher" (PID 50552) exited with exit code 1
2024-01-22 23:23:27.000 GMT [50547] LOG: shutting down
2024-01-22 23:23:27.001 GMT [50547] LOG: checkpoint starting: shutdown immediate
2024-01-22 23:23:27.002 GMT [50547] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/4BE78D0, redo lsn=0/4BE78D0
2024-01-22 23:23:27.005 GMT [50546] LOG: database system is shut down

[0]: https://github.com/curl/curl/commit/8b7cbe9decc205b08ec8258eb184c89a33e3084b
[1]: https://developer.apple.com/documentation/systemconfiguration/1517088-scdynamicstorecopyproxies

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

curlexe.zipapplication/zip; name=curlexe.zipDownload
v1-0001-Remove-pthread_is_threaded_np-checks-in-postmaste.patchtext/x-patch; charset=utf-8; name=v1-0001-Remove-pthread_is_threaded_np-checks-in-postmaste.patchDownload
From 0a514e7a8ea2af21d98cbcc2e4da799745786155 Mon Sep 17 00:00:00 2001
From: Tristan Partin <tristan@neon.tech>
Date: Tue, 23 Jan 2024 13:14:33 -0600
Subject: [PATCH v1] Remove pthread_is_threaded_np() checks in postmaster

These checks are not effective for what they are trying to prevent.
Libraries that are preloaded can easily cause this check to fail.
---
 configure                           |  2 +-
 configure.ac                        |  1 -
 meson.build                         |  1 -
 src/backend/postmaster/postmaster.c | 27 ---------------------------
 src/include/pg_config.h.in          |  3 ---
 5 files changed, 1 insertion(+), 33 deletions(-)

diff --git a/configure b/configure
index 70a1968003..e790015c2f 100755
--- a/configure
+++ b/configure
@@ -15573,7 +15573,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in backtrace_symbols copyfile getifaddrs getpeerucred inet_pton kqueue mbstowcs_l memset_s posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strchrnul strsignal syncfs sync_file_range uselocale wcstombs_l
+for ac_func in backtrace_symbols copyfile getifaddrs getpeerucred inet_pton kqueue mbstowcs_l memset_s posix_fallocate ppoll setproctitle setproctitle_fast strchrnul strsignal syncfs sync_file_range uselocale wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index 52fd7af446..69356df254 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1784,7 +1784,6 @@ AC_CHECK_FUNCS(m4_normalize([
 	memset_s
 	posix_fallocate
 	ppoll
-	pthread_is_threaded_np
 	setproctitle
 	setproctitle_fast
 	strchrnul
diff --git a/meson.build b/meson.build
index 55184db248..56573c1585 100644
--- a/meson.build
+++ b/meson.build
@@ -2441,7 +2441,6 @@ func_checks = [
   ['ppoll'],
   ['pstat'],
   ['pthread_barrier_wait', {'dependencies': [thread_dep]}],
-  ['pthread_is_threaded_np', {'dependencies': [thread_dep]}],
   ['sem_init', {'dependencies': [rt_dep, thread_dep], 'skip': sema_kind != 'unnamed_posix', 'define': false}],
   ['setproctitle', {'dependencies': [util_dep]}],
   ['setproctitle_fast'],
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index feb471dd1d..40968aadc5 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1426,24 +1426,6 @@ PostmasterMain(int argc, char *argv[])
 		 */
 	}
 
-#ifdef HAVE_PTHREAD_IS_THREADED_NP
-
-	/*
-	 * On macOS, libintl replaces setlocale() with a version that calls
-	 * CFLocaleCopyCurrent() when its second argument is "" and every relevant
-	 * environment variable is unset or empty.  CFLocaleCopyCurrent() makes
-	 * the process multithreaded.  The postmaster calls sigprocmask() and
-	 * calls fork() without an immediate exec(), both of which have undefined
-	 * behavior in a multithreaded program.  A multithreaded postmaster is the
-	 * normal case on Windows, which offers neither fork() nor sigprocmask().
-	 */
-	if (pthread_is_threaded_np() != 0)
-		ereport(FATAL,
-				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-				 errmsg("postmaster became multithreaded during startup"),
-				 errhint("Set the LC_ALL environment variable to a valid locale.")));
-#endif
-
 	/*
 	 * Remember postmaster startup time
 	 */
@@ -1849,15 +1831,6 @@ ServerLoop(void)
 		if (StartWorkerNeeded || HaveCrashedWorker)
 			maybe_start_bgworkers();
 
-#ifdef HAVE_PTHREAD_IS_THREADED_NP
-
-		/*
-		 * With assertions enabled, check regularly for appearance of
-		 * additional threads.  All builds check at start and exit.
-		 */
-		Assert(pthread_is_threaded_np() == 0);
-#endif
-
 		/*
 		 * Lastly, check to see if it's time to do some things that we don't
 		 * want to do every single time through the loop, because they're a
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 288bb9cb42..922882b26f 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -336,9 +336,6 @@
 /* Define to 1 if you have the `pthread_barrier_wait' function. */
 #undef HAVE_PTHREAD_BARRIER_WAIT
 
-/* Define to 1 if you have the `pthread_is_threaded_np' function. */
-#undef HAVE_PTHREAD_IS_THREADED_NP
-
 /* Have PTHREAD_PRIO_INHERIT. */
 #undef HAVE_PTHREAD_PRIO_INHERIT
 
-- 
Tristan Partin
Neon (https://neon.tech)

#2Andres Freund
andres@anarazel.de
In reply to: Tristan Partin (#1)
Re: Remove pthread_is_threaded_np() checks in postmaster

Hi,
On 2024-01-23 13:20:15 -0600, Tristan Partin wrote:

These checks are not effective for what they are trying to prevent. A recent
commit[0] in libcurl when used on macOS has been tripping the
pthread_is_threaded_np() check in postmaster.c for shared_preload_libraries
entries which use libcurl (like Neon). Under the hood, libcurl calls
SCDynamicStoreCopyProxies[1], which apparently causes the check to fail.

Maybe I'm missing something, but isn't that indicating the exact opposite,
namely that the check is precisely doing what it's intended?

Greetings,

Andres Freund

#3Tristan Partin
tristan@neon.tech
In reply to: Andres Freund (#2)
Re: Remove pthread_is_threaded_np() checks in postmaster

On Tue Jan 23, 2024 at 1:47 PM CST, Andres Freund wrote:

Hi,
On 2024-01-23 13:20:15 -0600, Tristan Partin wrote:

These checks are not effective for what they are trying to prevent. A recent
commit[0] in libcurl when used on macOS has been tripping the
pthread_is_threaded_np() check in postmaster.c for shared_preload_libraries
entries which use libcurl (like Neon). Under the hood, libcurl calls
SCDynamicStoreCopyProxies[1], which apparently causes the check to fail.

Maybe I'm missing something, but isn't that indicating the exact opposite,
namely that the check is precisely doing what it's intended?

The logic in the original commit seems sound:

On Darwin, detect and report a multithreaded postmaster.

Darwin --enable-nls builds use a substitute setlocale() that may start a
thread. Buildfarm member orangutan experienced BackendList corruption
on account of different postmaster threads executing signal handlers
simultaneously. Furthermore, a multithreaded postmaster risks undefined
behavior from sigprocmask() and fork(). Emit LOG messages about the
problem and its workaround. Back-patch to 9.0 (all supported versions).

Some reading from signal(7):

A process-directed signal may be delivered to any one of the threads
that does not currently have the signal blocked. If more than one of
the threads has the signal unblocked, then the kernel chooses an
arbitrary thread to which to deliver the signal.

So it sounds like the commit is trying to protect from the last
sentence.

And then forks only copy from their parent thread...

Please ignore this thread. I need to think of a better solution to this
problem.

--
Tristan Partin
Neon (https://neon.tech)

#4Tristan Partin
tristan@neon.tech
In reply to: Tristan Partin (#3)
Re: Remove pthread_is_threaded_np() checks in postmaster

On Tue Jan 23, 2024 at 2:10 PM CST, Tristan Partin wrote:

On Tue Jan 23, 2024 at 1:47 PM CST, Andres Freund wrote:

Hi,
On 2024-01-23 13:20:15 -0600, Tristan Partin wrote:

These checks are not effective for what they are trying to prevent. A recent
commit[0] in libcurl when used on macOS has been tripping the
pthread_is_threaded_np() check in postmaster.c for shared_preload_libraries
entries which use libcurl (like Neon). Under the hood, libcurl calls
SCDynamicStoreCopyProxies[1], which apparently causes the check to fail.

Maybe I'm missing something, but isn't that indicating the exact opposite,
namely that the check is precisely doing what it's intended?

The logic in the original commit seems sound:

On Darwin, detect and report a multithreaded postmaster.

Darwin --enable-nls builds use a substitute setlocale() that may start a
thread. Buildfarm member orangutan experienced BackendList corruption
on account of different postmaster threads executing signal handlers
simultaneously. Furthermore, a multithreaded postmaster risks undefined
behavior from sigprocmask() and fork(). Emit LOG messages about the
problem and its workaround. Back-patch to 9.0 (all supported versions).

Some reading from signal(7):

A process-directed signal may be delivered to any one of the threads
that does not currently have the signal blocked. If more than one of
the threads has the signal unblocked, then the kernel chooses an
arbitrary thread to which to deliver the signal.

So it sounds like the commit is trying to protect from the last
sentence.

And then forks only copy from their parent thread...

What is keeping us from using pthread_sigmask(3) instead of
sigprocmask(2)? If an extension can guarantee that threads that get
launched by it don't interact with anything Postgres-related, would that
be enough to protect from any fork(2) related issues? In the OP example,
is there any harm in the thread that libcurl inadvertantly launches from
a Postgres perspective?

--
Tristan Partin
Neon (https://neon.tech)

#5Andres Freund
andres@anarazel.de
In reply to: Tristan Partin (#4)
Re: Remove pthread_is_threaded_np() checks in postmaster

Hi,

On 2024-01-23 15:50:11 -0600, Tristan Partin wrote:

What is keeping us from using pthread_sigmask(3) instead of sigprocmask(2)?

We would need to make sure to compile with threading support everywhere. One
issue is that on some platforms things get slower once you actually start
using pthreads.

If an extension can guarantee that threads that get launched by it don't
interact with anything Postgres-related, would that be enough to protect
from any fork(2) related issues?

A fork() while threads are running is undefined behavior IIRC, and undefined
behavior isn't limited to a single thread. You'd definitely need to use
pthread_sigprocmask etc to address that aspect alone.

Greetings,

Andres Freund

#6Tristan Partin
tristan@neon.tech
In reply to: Andres Freund (#5)
Re: Remove pthread_is_threaded_np() checks in postmaster

On Tue Jan 23, 2024 at 4:23 PM CST, Andres Freund wrote:

Hi,

On 2024-01-23 15:50:11 -0600, Tristan Partin wrote:

What is keeping us from using pthread_sigmask(3) instead of sigprocmask(2)?

We would need to make sure to compile with threading support everywhere. One
issue is that on some platforms things get slower once you actually start
using pthreads.

What are examples of these reduced performance platforms?

From reading the meson.build files, it seems like building with
threading enabled is the future, so should we just rip the band-aid off
for 17?

If an extension can guarantee that threads that get launched by it don't
interact with anything Postgres-related, would that be enough to protect
from any fork(2) related issues?

A fork() while threads are running is undefined behavior IIRC, and undefined
behavior isn't limited to a single thread. You'd definitely need to use
pthread_sigprocmask etc to address that aspect alone.

If you can find a resource that explains the UB, I would be very
interested to read that. I found a SO[0]https://stackoverflow.com/a/42679479/7572728 answer that made it seem like
this actually wasn't the case.

[0]: https://stackoverflow.com/a/42679479/7572728

--
Tristan Partin
Neon (https://neon.tech)

#7Andres Freund
andres@anarazel.de
In reply to: Tristan Partin (#6)
Re: Remove pthread_is_threaded_np() checks in postmaster

Hi,

On 2024-01-23 17:26:19 -0600, Tristan Partin wrote:

On Tue Jan 23, 2024 at 4:23 PM CST, Andres Freund wrote:

Hi,

On 2024-01-23 15:50:11 -0600, Tristan Partin wrote:

What is keeping us from using pthread_sigmask(3) instead of sigprocmask(2)?

We would need to make sure to compile with threading support everywhere. One
issue is that on some platforms things get slower once you actually start
using pthreads.

What are examples of these reduced performance platforms?

With some libc, including IIRC glibc, FILE* style io starts to use locking,
for example. Which we likely shouldn't use as heavily as we do, but we
currently do use it for things like COPY.

From reading the meson.build files, it seems like building with threading
enabled is the future, so should we just rip the band-aid off for 17?

Building with -pthreads and using threads are separate things...

If an extension can guarantee that threads that get launched by it don't
interact with anything Postgres-related, would that be enough to protect
from any fork(2) related issues?

A fork() while threads are running is undefined behavior IIRC, and undefined
behavior isn't limited to a single thread. You'd definitely need to use
pthread_sigprocmask etc to address that aspect alone.

If you can find a resource that explains the UB, I would be very interested
to read that. I found a SO[0] answer that made it seem like this actually
wasn't the case.

I think there are safe ways to do it, but I don't think we currently reliably
do so. It certainly wouldn't be well defined to have a thread created in
postmaster, before backends are forked off ("the child process may only
execute async-signal-safe operations until such time as one of the exec
functions is called").

Greetings,

Andres Freund

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#7)
Re: Remove pthread_is_threaded_np() checks in postmaster

On Wed, Jan 24, 2024 at 1:39 PM Andres Freund <andres@anarazel.de> wrote:

On 2024-01-23 17:26:19 -0600, Tristan Partin wrote:

On Tue Jan 23, 2024 at 4:23 PM CST, Andres Freund wrote:

A fork() while threads are running is undefined behavior IIRC, and undefined
behavior isn't limited to a single thread. You'd definitely need to use
pthread_sigprocmask etc to address that aspect alone.

If you can find a resource that explains the UB, I would be very interested
to read that. I found a SO[0] answer that made it seem like this actually
wasn't the case.

I think there are safe ways to do it, but I don't think we currently reliably
do so. It certainly wouldn't be well defined to have a thread created in
postmaster, before backends are forked off ("the child process may only
execute async-signal-safe operations until such time as one of the exec
functions is called").

Right, the classic example is that if you fork() while some other
thread is in malloc() or fwrite() or whatever libc or other unknown
code it might hold a mutex that will never be released in the child.

As for what exactly might be happening in this case, I tried calling
SCDynamicStoreCopyProxies() and saw a new thread sitting in
__workq_kernreturn, which smells like libdispatch AKA GCD, Apple's
thread pool job dispatch thing. I tried to step my way through and
follow along on Apple's github and saw plenty of uses of libdispatch
in CoreFoundation code, but not the culprit, and then I hit libxpc,
which is closed source so I lost interest. Boo. Then I found this
article that says some interesting stuff about all that:

https://www.evanjones.ca/fork-is-dangerous.html

That code has changed a bit since then but still tries to detect unsafe forks.

https://github.com/apple-oss-distributions/libdispatch/blob/main/src/init.c

These days, I don't think the original corruption complaint that led
to that am-I-multihreaded check being added to PostgreSQL could happen
anyway, because the postmaster would now process its state machine
serially in the main thread's work loop even if a random unexpected
thread happened to run the handler. But obviously that doesn't help
us with these other complications so that observation isn't very
interesting.

As for sigprocmask() vs pthread_sigmask(), the sources of
unspecifiedness I am aware of are: (1) Unix vendors disagreeing on
whether the former affected only the calling thread or the whole
process before threads were standardised, and we can see that they
still differ today (eg Darwin/XNU loops over all threads setting them,
while many other systems do exactly the same as pthread_sigmask()),
and (2) libcs sometimes use or defer some signals for their own
private purposes, so sometimes pthread_sigmask() has a wrapper doing
some footwork in userspace rather than just invoking the system call,
but I dunno. In one of the threads about avoiding bad behaviour
around system(), I think there might have been some ideas about
getting rid of the need to block signals at all, which I think must be
theoretically possible if the handlers are smart enough to avoid
misbehaving in child processes, and maybe we use moar latches.