[BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

Started by Emmanuel Sibi4 months ago12 messages
#1Emmanuel Sibi
emmanuelsibi.mec@gmail.com
1 attachment(s)

Hi hackers, I've found a bug that causes PostgreSQL to crash during startup when built with ThreadSanitizer (-fsanitize=thread).

My environment
Ubuntu 24.04.1 LTS (kernel 6.14.0-29-generic)
clang 18
PostgreSQL 17.2
Build Configuration: ./configure --enable-debug --enable-cassert CFLAGS="-fsanitize=thread -g"

PostgreSQL compiled with ThreadSanitizer (-fsanitize=thread) crashes with SIGSEGV during program initialization, before reaching main().

Steps to Reproduce

1. Configure PostgreSQL with ThreadSanitizer
2. ./configure --enable-debug CFLAGS="-fsanitize=thread -g"
3. make
4. Run any PostgreSQL command: ./postgres --version

Expected Behavior: Program should start normally and display version information.
Actual Behavior: Segmentation fault during early initialization

Root Cause: The __ubsan_default_options() function in main.c is compiled with TSan instrumentation, creating a circular dependency during sanitizer runtime initialization.
1. TSan initialization calls __ubsan_default_options()
2. TSan tries to instrument the function
3. Instrumentation requires initialized ThreadState
4. ThreadState isn't ready because TSan init isn't complete
5. Segfault/crash occurs

Proposed Fix: Move __ubsan_default_options() to a separate compilation unit built without sanitizer instrumentation.
The below attached patch moves the function to a separate compilation unit with a custom Makefile rule that uses -fno-sanitize=thread,address,undefined. The reached_main check is preserved to avoid calling getenv() before libc is fully initialized and to handle cases where set_ps_display() breaks /proc/$pid/environ.

Please let me know if you have any questions or would like further details.
Thanks & Regards,
Emmanuel Sibi

Attachments:

tsan_segfault.patchapplication/x-patch; name=tsan_segfault.patch; x-unix-mode=0644Download
diff --git a/src/backend/main/Makefile b/src/backend/main/Makefile
index 6d34072624b..e49b10fe0c4 100644
--- a/src/backend/main/Makefile
+++ b/src/backend/main/Makefile
@@ -13,6 +13,11 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = \
-	main.o
+	main.o \
+	sanitizer_hook.o
+
+# Custom rule to build sanitizer_hook.o without sanitizer instrumentation
+sanitizer_hook.o: sanitizer_hook.c
+	$(CC) $(CPPFLAGS) $(CFLAGS) -fno-sanitize=thread,address,undefined -c -o $@ $<
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 4672aab8378..be61d8db0a6 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -42,7 +42,7 @@
 
 
 const char *progname;
-static bool reached_main = false;
+bool reached_main = false;
 
 
 static void startup_hacks(const char *progname);
@@ -415,29 +415,3 @@ check_root(const char *progname)
 #endif							/* WIN32 */
 }
 
-/*
- * At least on linux, set_ps_display() breaks /proc/$pid/environ. The
- * sanitizer library uses /proc/$pid/environ to implement getenv() as it wants
- * to work independent of libc. When just using undefined and alignment
- * sanitizers, the sanitizer library is only initialized when the first error
- * occurs, by which time we've often already called set_ps_display(),
- * preventing the sanitizer libraries from seeing the options.
- *
- * We can work around that by defining __ubsan_default_options, a weak symbol
- * libsanitizer uses to get defaults from the application, and return
- * getenv("UBSAN_OPTIONS"). But only if main already was reached, so that we
- * don't end up relying on a not-yet-working getenv().
- *
- * As this function won't get called when not running a sanitizer, it doesn't
- * seem necessary to only compile it conditionally.
- */
-const char *__ubsan_default_options(void);
-const char *
-__ubsan_default_options(void)
-{
-	/* don't call libc before it's guaranteed to be initialized */
-	if (!reached_main)
-		return "";
-
-	return getenv("UBSAN_OPTIONS");
-}
diff --git a/src/backend/main/sanitizer_hook.c b/src/backend/main/sanitizer_hook.c
new file mode 100644
index 00000000000..61302de0ead
--- /dev/null
+++ b/src/backend/main/sanitizer_hook.c
@@ -0,0 +1,55 @@
+/*-------------------------------------------------------------------------
+ *
+ * sanitizer_hook.c
+ *	  UBSan options hook to avoid initialization timing issues
+ *
+ * This file provides __ubsan_default_options() without sanitizer
+ * instrumentation to prevent circular dependencies during early
+ * program startup when set_ps_display() breaks /proc/$pid/environ.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/main/sanitizer_hook.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <signal.h>
+#include <stdlib.h>
+
+
+extern bool reached_main;
+
+
+/*
+ * At least on linux, set_ps_display() breaks /proc/$pid/environ. The
+ * sanitizer library uses /proc/$pid/environ to implement getenv() as it wants
+ * to work independent of libc. When just using undefined and alignment
+ * sanitizers, the sanitizer library is only initialized when the first error
+ * occurs, by which time we've often already called set_ps_display(),
+ * preventing the sanitizer libraries from seeing the options.
+ *
+ * We can work around that by defining __ubsan_default_options, a weak symbol
+ * libsanitizer uses to get defaults from the application, and return
+ * getenv("UBSAN_OPTIONS"). But only if main already was reached, so that we
+ * don't end up relying on a not-yet-working getenv().
+ *
+ * As this function won't get called when not running a sanitizer, it doesn't
+ * seem necessary to only compile it conditionally.
+ */
+const char *
+__ubsan_default_options(void)
+{
+	/*
+	 * Avoid calling libc until program startup is complete (reached_main).
+	 * This prevents sanitizer initialization issues when set_ps_display()
+	 * breaks /proc/$pid/environ before sanitizer can read UBSAN_OPTIONS.
+	 */
+	if (!reached_main)
+		return "";
+
+	return getenv("UBSAN_OPTIONS");
+}
#2Aleksander Alekseev
aleksander@tigerdata.com
In reply to: Emmanuel Sibi (#1)
Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

Hi Emmanuel,

Hi hackers, I've found a bug that causes PostgreSQL to crash during startup when built with ThreadSanitizer (-fsanitize=thread).

[...]

Thanks for reporting this. Did you investigate whether Meson also has
this issue? Fixing anything for Autotools arguably has low priority
since we are going to get rid of it in the near future, but Meson is
another matter.

--
Best regards,
Aleksander Alekseev

#3Emmanuel Sibi
emmanuelsibi.mec@gmail.com
In reply to: Aleksander Alekseev (#2)
Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

Hi Aleksander,

Thanks for reporting this. Did you investigate whether Meson also has
this issue? Fixing anything for Autotools arguably has low priority
since we are going to get rid of it in the near future, but Meson is
another matter.

Thanks for the reply. Yes, I tested with Meson and confirmed the same
issue occurs.
When building PostgreSQL 17.2 with ThreadSanitizer:
meson setup builddir \
--prefix=/path/to/install \
--buildtype=debug \
-Dcassert=true \
-Dtap_tests=enabled \
-Db_lto=false \
-Db_sanitize=thread \
-Db_lundef=false \
-Dc_args="-O0 -g -gdwarf-2 -fno-omit-frame-pointer"

The postgres binary segfaults during early initialization, exactly as
with the Autotools build.
Applying the patch I submitted, which moves __ubsan_default_options()
to a separate compilation unit built without sanitizer
instrumentation, successfully resolves the segfault.

Thanks & regards,
Emmanuel Sibi

#4Aleksander Alekseev
aleksander@tigerdata.com
In reply to: Emmanuel Sibi (#3)
Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

Hi Emmanuel,

Thanks for the reply. Yes, I tested with Meson and confirmed the same
issue occurs.
[...]

OK, thanks for the details. Please don't forget to register your patch
on the nearest open commitfest:

https://commitfest.postgresql.org/56/

Otherwise it can be lost.

--
Best regards,
Aleksander Alekseev

#5Quan Zongliang
quanzongliang@yeah.net
In reply to: Aleksander Alekseev (#4)
Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

On 9/9/25 10:37 PM, Aleksander Alekseev wrote:

Hi Emmanuel,

Thanks for the reply. Yes, I tested with Meson and confirmed the same
issue occurs.
[...]

OK, thanks for the details. Please don't forget to register your patch
on the nearest open commitfest:

https://commitfest.postgresql.org/56/

Otherwise it can be lost.

I tested this patch. postgres -V no longer crashes.

There is a minor issue. I'm not sure if it's caused by this patch.

The database can only be shut down using the immediate mode.

2025-09-10 08:29:52.667 CST [53336] LOG: received immediate shutdown
request
2025-09-10 08:29:57.771 CST [53336] LOG: issuing SIGKILL to
recalcitrant children
2025-09-10 08:29:57.897 CST [53336] LOG: database system is shut down

Other modes can only indicate that a shutdown request has been received
but cannot actually stop it.

2025-09-10 08:21:13.445 CST [53280] LOG: received fast shutdown request
2025-09-10 08:21:13.446 CST [53280] LOG: aborting any active transactions
2025-09-10 08:21:13.587 CST [53280] LOG: background worker "logical
replication launcher" (PID 53290) exited with exit code 1
2025-09-10 08:21:13.588 CST [53284] LOG: shutting down
2025-09-10 08:21:13.588 CST [53284] LOG: checkpoint starting: shutdown fast
2025-09-10 08:21:13.607 CST [53284] LOG: checkpoint complete: wrote 0
buffers (0.0%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0
recycled; write=0.017 s, sync=0.001 s, total=0.020 s; sync files=2,
longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB;
lsn=0/0178A340, redo lsn=0/0178A340

The smart mode is the same as well.

Best regards,
Quan Zongliang

#6Emmanuel Sibi
emmanuelsibi.mec@gmail.com
In reply to: Quan Zongliang (#5)
Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

Hi Quan, Thanks for testing the patch! I'm glad it resolves the startup crash.

I tested this patch. postgres -V no longer crashes.

Regarding the shutdown issue - I tested extensively with
ThreadSanitizer enabled using both build configurations I mentioned
earlier, and all shutdown modes work correctly in my environment:

There is a minor issue. I'm not sure if it's caused by this patch.
The database can only be shut down using the immediate mode.

Fast shutdown:
2025-09-15 20:20:07.454 IST [28229] LOG: received fast shutdown request
[...]
2025-09-15 20:20:07.574 IST [28229] LOG: database system is shut down
Smart shutdown:
2025-09-15 20:28:14.271 IST [31263] LOG: received smart shutdown request
[...]
2025-09-15 20:28:14.399 IST [31263] LOG: database system is shut down

All modes complete within seconds with the final "database system is
shut down" message, unlike the hang you're experiencing.
The patch only moves __ubsan_default_options() to a separate
compilation unit to avoid TSan initialization issues during startup.
It doesn't modify any shutdown logic or signal handling code.
My Environment: Ubuntu 24.04.1, clang 18, PostgreSQL 17.2.

Best regards,
Emmanuel

#7Quan Zongliang
quanzongliang@yeah.net
In reply to: Emmanuel Sibi (#6)
Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

On 9/15/25 11:19 PM, Emmanuel Sibi wrote:

Hi Quan, Thanks for testing the patch! I'm glad it resolves the startup crash.

Fast shutdown:
2025-09-15 20:20:07.454 IST [28229] LOG: received fast shutdown request
[...]
2025-09-15 20:20:07.574 IST [28229] LOG: database system is shut down
Smart shutdown:
2025-09-15 20:28:14.271 IST [31263] LOG: received smart shutdown request
[...]
2025-09-15 20:28:14.399 IST [31263] LOG: database system is shut down

All modes complete within seconds with the final "database system is
shut down" message, unlike the hang you're experiencing.
The patch only moves __ubsan_default_options() to a separate
compilation unit to avoid TSan initialization issues during startup.
It doesn't modify any shutdown logic or signal handling code.
My Environment: Ubuntu 24.04.1, clang 18, PostgreSQL 17.2.

Great! My OS is macOS 15.6.1. I will continue to test to confirm if
there are any other issues. If so, I will create a separate patch.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Emmanuel Sibi (#1)
Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

Emmanuel Sibi <emmanuelsibi.mec@gmail.com> writes:

Root Cause: The __ubsan_default_options() function in main.c is compiled with TSan instrumentation, creating a circular dependency during sanitizer runtime initialization.
1. TSan initialization calls __ubsan_default_options()
2. TSan tries to instrument the function
3. Instrumentation requires initialized ThreadState
4. ThreadState isn't ready because TSan init isn't complete
5. Segfault/crash occurs

Hmm. I wonder what is the argument that this is not a bug
of UBSan itself, as it seems to make __ubsan_default_options()
next door to impossible to use safely.

Proposed Fix: Move __ubsan_default_options() to a separate compilation unit built without sanitizer instrumentation.

I do not love this fix, as it requires exposing reached_main globally,
not to mention getting both of our build systems involved in the hack.
Another problem is that it only defends against a limited set of
sanitizers, though presumably every single one is broken in the same
way (compare [1]/messages/by-id/dbf77bf7-6e54-ed8a-c4ae-d196eeb664ce@gmail.com).

I tried this as an alternative solution, but it didn't seem to help:

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index bdcb5e4f261..cc63da97360 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -500,6 +500,12 @@ check_root(const char *progname)
  * seem necessary to only compile it conditionally.
  */
 const char *__ubsan_default_options(void);
+
+#if __has_attribute(no_sanitize)
+__attribute__((no_sanitize("thread")))
+__attribute__((no_sanitize("address")))
+__attribute__((no_sanitize("undefined")))
+#endif
 const char *
 __ubsan_default_options(void)
 {

This is of course no better on the "limited set of sanitizers"
angle, but it at least keeps the hack localized. Of course,
if it doesn't work that's all moot, but I wonder why not ---
seems like it should have about the same effect as your proposal.
(I did verify that clang complains if I misspell a no_sanitize
argument, so it's not that the syntax has no effect at all.)

Anyway, I think really a bug report to the UBSan folk asking
how one is supposed to use __ubsan_default_options() safely
might be productive.

regards, tom lane

[1]: /messages/by-id/dbf77bf7-6e54-ed8a-c4ae-d196eeb664ce@gmail.com

#9Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Tom Lane (#8)
Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

On Tue, Nov 4, 2025 at 2:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Another problem is that it only defends against a limited set of
sanitizers, though presumably every single one is broken in the same
way (compare [1]).

How about __attribute__((disable_sanitizer_instrumentation)) ? LLVM's
own tests make some use of this [1]https://github.com/llvm/llvm-project/blob/2b4ac6629/compiler-rt/test/sanitizer_common/TestCases/dlsym_alloc.c.

--Jacob

[1]: https://github.com/llvm/llvm-project/blob/2b4ac6629/compiler-rt/test/sanitizer_common/TestCases/dlsym_alloc.c

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jacob Champion (#9)
Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

Jacob Champion <jacob.champion@enterprisedb.com> writes:

On Tue, Nov 4, 2025 at 2:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Another problem is that it only defends against a limited set of
sanitizers, though presumably every single one is broken in the same
way (compare [1]).

How about __attribute__((disable_sanitizer_instrumentation)) ? LLVM's
own tests make some use of this [1].

Hah, thanks for the research! For me, this stops the failure
(on RHEL9 with clang version 19.1.7):

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index bdcb5e4f261..1bd63ec9184 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -500,6 +500,10 @@ check_root(const char *progname)
  * seem necessary to only compile it conditionally.
  */
 const char *__ubsan_default_options(void);
+
+#if __has_attribute(disable_sanitizer_instrumentation)
+__attribute__((disable_sanitizer_instrumentation))
+#endif
 const char *
 __ubsan_default_options(void)
 {

Assuming that works for Emmanuel, we could wrap it in a
pg_disable_sanitizer_instrumentation macro, or just use it
as-is. I don't have a strong preference --- any thoughts?

(It could do with a comment, either way.)

regards, tom lane

#11Jacob Champion
jacob.champion@enterprisedb.com
In reply to: Tom Lane (#10)
Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

On Tue, Nov 4, 2025 at 5:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hah, thanks for the research! For me, this stops the failure
(on RHEL9 with clang version 19.1.7):

Awesome!

Assuming that works for Emmanuel, we could wrap it in a
pg_disable_sanitizer_instrumentation macro, or just use it
as-is. I don't have a strong preference --- any thoughts?

No preference on my end. (If a second place to use it pops up, we
could wrap it then.)

--Jacob

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jacob Champion (#11)
Re: [BUG] PostgreSQL crashes with ThreadSanitizer during early initialization

Jacob Champion <jacob.champion@enterprisedb.com> writes:

On Tue, Nov 4, 2025 at 5:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Assuming that works for Emmanuel, we could wrap it in a
pg_disable_sanitizer_instrumentation macro, or just use it
as-is. I don't have a strong preference --- any thoughts?

No preference on my end. (If a second place to use it pops up, we
could wrap it then.)

Yeah, that's what I concluded after sleeping on it. Right now
it seems unlikely that there will be more usages, so adding a
macro in c.h would just slow down the build (admittedly only
microscopically) for no gain. If we find additional usages
then we can revisit that tradeoff.

regards, tom lane