ubsan
Hi,
I tried to run postgres with ubsan to debug something. I ran into two main
issues:
1) Despite Tom's recent efforts in [1]/messages/by-id/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com, I see two ubsan failures in
HEAD.
These are easy enough to fix as per the attached, although the fix for the
GetConfigOptionByNum() isn't great - we should probably pass a nulls array
to GetConfigOptionByNum() as well, but that'd have a bunch of followup
changes. So I'm inclined to go with the minimal for now.
2) When debugging issues I got very confused by the fact that *sometimes*
UBSAN_OPTIONS takes effect and sometimes not. I was trying to have ubsan
generate backtraces as well as a coredump with
UBSAN_OPTIONS="print_stacktrace=1:disable_coredump=0:abort_on_error=1:verbosity=2"
After a lot of debugging I figured out that the options took effect in
postmaster, but not in any children. Which in turn is because
set_ps_display() breaks /proc/$pid/environ - it's empty in all postmaster
children for me.
The sanitizer library use /proc/$pid/environ (from [2]https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libsanitizer/sanitizer_common/sanitizer_linux.cpp;h=aa59d9718ca89cc554bdf677df3e64ddd233ca59;hb=refs/heads/master#l559 to [3]), because
they don't want to use getenv() because it wants to work without libc
(whether that's the right way, i have my doubts). When just using
undefined and alignment sanitizers, the sanitizer library is only
initialized when the first error occurs, by which time we've often already
called set_ps_display().
Note that ps_status.c breaks /proc/$pid/environ even if the
update_process_title GUC is set to false, because init_ps_display() ignores
that. Yes, that confused me for a while last night.
The reason that /proc/$pid/environ is borken is fairly obvious: We
overwrite it in set_ps_display() in the PS_USE_CLOBBER_ARGV case. The
kernel just looks at the range set up originally, which we'll overwrite
with zeroes.
We could try telling the kernel about the new location of environ using
prctl() PR_SET_MM_ENV_START/END but the restrictions around that sound
problematic.
I've included a hacky workaround: Define __ubsan_default_options, a weak
symbol libsanitizer uses to get defaults from the application, and return
getenv("UBSAN_OPTIONS"). But only if main already was reached, so that we
don't end up relying on a not-yet-working getenv().
This also lead me to finally figure out why /proc/$pid/cmdline of postgres
children includes a lot of NULL bytes. I'd noticed this because tools like
pidstat -l started to print long long status strings at some point, before
it got fixed on the pidstat side.
The way we overwrite doesn't trigger this special case in the kernel:
https://github.com/torvalds/linux/blob/master/fs/proc/base.c#L296
therefore the original size of the commandline arguments are printed, with
a lot of boring zeroes.
A test run of this in ci, with the guc.c intentionally reintroduced, shows the
failure both via core dump
https://api.cirrus-ci.com/v1/task/6543164315009024/logs/cores.log
and
in postmaster's log: https://api.cirrus-ci.com/v1/artifact/task/6543164315009024/log/src/test/regress/log/postmaster.log
although that part is a bit annoying to read, because the error is
interspersed with other log messages:
guc.c:9801:15: runtime error: null pointer passed as argument 2, which is declared to never be null
==19253==Using libbacktrace symbolizer.
2022-03-23 17:20:35.412 UTC [19258][client backend] [pg_regress/alter_operator][14/429:0] ERROR: must be owner of operator ===
...
2022-03-23 17:20:35.601 UTC [19254][client backend] [pg_regress/alter_generic][10/1569:0] STATEMENT: ALTER STATISTICS alt_stat2 OWNER TO regress_alter_generic_user2;
#0 0x5626562b2ab4 in GetConfigOptionByNum /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:9801
#1 0x5626562b3fd5 in show_all_settings /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:10137
2022-03-23 17:20:35.604 UTC [19254][client backend] [pg_regress/alter_generic][10/1576:0] ERROR: must be owner of statistics object alt_stat3
...
2022-03-23 17:20:35.601 UTC [19254][client backend] [pg_regress/alter_generic][10/1569:0] STATEMENT: ALTER STATISTICS alt_stat2 OWNER TO regress_alter_generic_user2;
#0 0x5626562b2ab4 in GetConfigOptionByNum /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:9801
#1 0x5626562b3fd5 in show_all_settings /tmp/cirrus-ci-build/src/backend/utils/misc/guc.c:10137
2022-03-23 17:20:35.604 UTC [19254][client backend] [pg_regress/alter_generic][10/1576:0] ERROR: must be owner of statistics object alt_stat3
2022-03-23 17:20:35.604 UTC [19254][client backend] [pg_regress/alter_generic][10/1576:0] STATEMENT: ALTER STATISTICS alt_stat3 RENAME TO alt_stat4;
#2 0x562655c0ea86 in ExecMakeTableFunctionResult /tmp/cirrus-ci-build/src/backend/executor/execSRF.c:234
#3 0x562655c3f8be in FunctionNext /tmp/cirrus-ci-build/src/backend/executor/nodeFunctionscan.c:95
2022-03-23 17:20:35.605 UTC [19254][client backend] [pg_regress/alter_generic][10/1578:0] ERROR: must be owner of statistics object alt_stat3
2022-03-23 17:20:35.605 UTC [19254][client backend] [pg_regress/alter_generic][10/1578:0] STATEMENT: ALTER STATISTICS alt_stat3 OWNER TO regress_alter_generic_user2;
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1579:0] ERROR: must be member of role "regress_alter_generic_user3"
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1579:0] STATEMENT: ALTER STATISTICS alt_stat2 OWNER TO regress_alter_generic_user3;
#4 0x562655c10175 in ExecScanFetch /tmp/cirrus-ci-build/src/backend/executor/execScan.c:133
#5 0x562655c10653 in ExecScan /tmp/cirrus-ci-build/src/backend/executor/execScan.c:199
#6 0x562655c3f643 in ExecFunctionScan /tmp/cirrus-ci-build/src/backend/executor/nodeFunctionscan.c:270
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1580:0] ERROR: must be owner of statistics object alt_stat3
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1580:0] STATEMENT: ALTER STATISTICS alt_stat3 SET SCHEMA alt_nsp2;
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1581:0] ERROR: statistics object "alt_stat2" already exists in schema "alt_nsp2"
2022-03-23 17:20:35.606 UTC [19254][client backend] [pg_regress/alter_generic][10/1581:0] STATEMENT: ALTER STATISTICS alt_stat2 SET SCHEMA alt_nsp2;
#7 0x562655c09bc9 in ExecProcNodeFirst /tmp/cirrus-ci-build/src/backend/executor/execProcnode.c:463
#8 0x562655bf7580 in ExecProcNode ../../../src/include/executor/executor.h:259
#9 0x562655bf7580 in ExecutePlan /tmp/cirrus-ci-build/src/backend/executor/execMain.c:1633
#10 0x562655bf78b9 in standard_ExecutorRun /tmp/cirrus-ci-build/src/backend/executor/execMain.c:362
Greetings,
Andres Freund
[1]: /messages/by-id/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com
[2]: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libsanitizer/sanitizer_common/sanitizer_linux.cpp;h=aa59d9718ca89cc554bdf677df3e64ddd233ca59;hb=refs/heads/master#l559
[2]: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libsanitizer/sanitizer_common/sanitizer_linux.cpp;h=aa59d9718ca89cc554bdf677df3e64ddd233ca59;hb=refs/heads/master#l559
Attachments:
v3-0001-configure-check-for-dlsym-not-just-dlopen.patchtext/x-diff; charset=us-asciiDownload
From 66117f8c593c5041bde9f50f95bfd7a6a0664744 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 23 Mar 2022 10:01:00 -0700
Subject: [PATCH v3 1/5] configure: check for dlsym not just dlopen.
When building with sanitizers the sanitizer library provides dlopen, but not
dlsym(), making configure think that -ldl isn't needed.
---
configure | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++
configure.ac | 3 +++
2 files changed, 59 insertions(+)
diff --git a/configure b/configure
index f3cb5c2b511..e605f5a17f1 100755
--- a/configure
+++ b/configure
@@ -11912,6 +11912,62 @@ if test "$ac_res" != no; then :
fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing dlsym" >&5
+$as_echo_n "checking for library containing dlsym... " >&6; }
+if ${ac_cv_search_dlsym+:} false; then :
+ $as_echo_n "(cached) " >&6
+else
+ ac_func_search_save_LIBS=$LIBS
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h. */
+
+/* Override any GCC internal prototype to avoid an error.
+ Use char because int might match the return type of a GCC
+ builtin and then its argument prototype would still apply. */
+#ifdef __cplusplus
+extern "C"
+#endif
+char dlsym ();
+int
+main ()
+{
+return dlsym ();
+ ;
+ return 0;
+}
+_ACEOF
+for ac_lib in '' dl; do
+ if test -z "$ac_lib"; then
+ ac_res="none required"
+ else
+ ac_res=-l$ac_lib
+ LIBS="-l$ac_lib $ac_func_search_save_LIBS"
+ fi
+ if ac_fn_c_try_link "$LINENO"; then :
+ ac_cv_search_dlsym=$ac_res
+fi
+rm -f core conftest.err conftest.$ac_objext \
+ conftest$ac_exeext
+ if ${ac_cv_search_dlsym+:} false; then :
+ break
+fi
+done
+if ${ac_cv_search_dlsym+:} false; then :
+
+else
+ ac_cv_search_dlsym=no
+fi
+rm conftest.$ac_ext
+LIBS=$ac_func_search_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_dlsym" >&5
+$as_echo "$ac_cv_search_dlsym" >&6; }
+ac_res=$ac_cv_search_dlsym
+if test "$ac_res" != no; then :
+ test "$ac_res" = "none required" || LIBS="$ac_res $LIBS"
+
+fi
+ # Address Sanitizer provides dlopen but not dlsym
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing socket" >&5
$as_echo_n "checking for library containing socket... " >&6; }
if ${ac_cv_search_socket+:} false; then :
diff --git a/configure.ac b/configure.ac
index 19d1a803673..b2440e9ae2d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1230,6 +1230,9 @@ AC_SUBST(PTHREAD_LIBS)
AC_CHECK_LIB(m, main)
AC_SEARCH_LIBS(setproctitle, util)
AC_SEARCH_LIBS(dlopen, dl)
+# Address sanitizer's helper library provides dlopen but not dlsym, thus when
+# enabling asan the dlopen check doesn't notice that -ldl is actually required.
+AC_SEARCH_LIBS(dlsym, dl)
AC_SEARCH_LIBS(socket, [socket ws2_32])
AC_SEARCH_LIBS(shl_load, dld)
AC_SEARCH_LIBS(getopt_long, [getopt gnugetopt])
--
2.35.1.354.g715d08a9e5
v3-0002-Hack-up-compatibility-between-ubsan-and-ps_status.patchtext/x-diff; charset=us-asciiDownload
From 97f22d7c7b126464fa2262f93519bcb3bc25c3dc Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 23 Mar 2022 09:57:19 -0700
Subject: [PATCH v3 2/5] Hack up compatibility between ubsan and ps_status.c.
---
src/backend/main/main.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index c43a527d3f9..7998fdd1f3f 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -48,6 +48,7 @@
const char *progname;
+static bool reached_main = false;
static void startup_hacks(const char *progname);
@@ -64,6 +65,8 @@ main(int argc, char *argv[])
{
bool do_check_root = true;
+ reached_main = true;
+
/*
* If supported on the current platform, set up a handler to be called if
* the backend/postmaster crashes with a fatal signal or exception.
@@ -443,3 +446,18 @@ check_root(const char *progname)
}
#endif /* WIN32 */
}
+
+const char *__ubsan_default_options(void);
+const char *
+__ubsan_default_options(void)
+{
+ /* don't call libc before it's initialized */
+ if (!reached_main)
+ return "";
+
+ /*
+ * Use our getenv because libsanitizer gets confused by ps_status.c
+ * overwriting the environ block.
+ */
+ return getenv("UBSAN_OPTIONS");
+}
--
2.35.1.354.g715d08a9e5
v3-0003-Fix-two-ubsan-violations.patchtext/x-diff; charset=us-asciiDownload
From 4886f82cc8dcb7de19f191dc58c7beda3417a1f4 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 23 Mar 2022 09:44:19 -0700
Subject: [PATCH v3 3/5] Fix two ubsan violations.
---
src/backend/utils/cache/relcache.c | 2 +-
src/backend/utils/misc/guc.c | 5 ++++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index fbd11883e17..3d05297b0d9 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -6528,7 +6528,7 @@ write_item(const void *data, Size len, FILE *fp)
{
if (fwrite(&len, 1, sizeof(len), fp) != sizeof(len))
elog(FATAL, "could not write init file");
- if (fwrite(data, 1, len, fp) != len)
+ if (len > 0 && fwrite(data, 1, len, fp) != len)
elog(FATAL, "could not write init file");
}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 932aefc777d..fb88a5e5828 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -9797,7 +9797,10 @@ GetConfigOptionByNum(int varnum, const char **values, bool *noshow)
values[4] = _(conf->short_desc);
/* extra_desc */
- values[5] = _(conf->long_desc);
+ if (conf->long_desc)
+ values[5] = _(conf->long_desc);
+ else
+ values[5] = "";
/* context */
values[6] = GucContext_Names[conf->context];
--
2.35.1.354.g715d08a9e5
v3-0004-ci-use-fsanitize-undefined-alignment-in-linux-tas.patchtext/x-diff; charset=us-asciiDownload
From e5e69c83957ece0086c2b7a958152857b5930050 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 23 Jan 2022 22:20:10 -0800
Subject: [PATCH v3 4/5] ci: use -fsanitize=undefined,alignment in linux tasks.
---
.cirrus.yml | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/.cirrus.yml b/.cirrus.yml
index e5335fede76..820aafce292 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -157,6 +157,21 @@ task:
LINUX_CONFIGURE_FEATURES: *LINUX_CONFIGURE_FEATURES
+ # Enable a reasonable set of sanitizers. Use the linux task for that, as
+ # it currently is the fastest task. Also several of the sanitizers work
+ # best on linux.
+ # The overhead of alignment sanitizer is low, undefined behaviour has
+ # moderate overhead. address sanitizer howerever is pretty expensive and
+ # thus not enabled.
+ #
+ # disable_coredump=0, abort_on_error=1: for useful backtraces in case of crashes
+ # print_stacktraces=1,verbosity=2, duh
+ # detect_leaks=0: too many uninteresting leak errors in short-lived binaries
+ UBSAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:verbosity=2
+ ASAN_OPTIONS: detect_leaks=0:disable_coredump=0:abort_on_error=1
+ EXTRA_CFLAGS: -fsanitize=alignment,undefined -fno-sanitize-recover=all
+ EXTRA_LDFLAGS: -fno-common
+
only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*linux.*'
compute_engine_instance:
@@ -201,8 +216,9 @@ task:
CC="ccache gcc" \
CXX="ccache g++" \
CLANG="ccache clang" \
- CFLAGS="-Og -ggdb" \
- CXXFLAGS="-Og -ggdb"
+ CFLAGS="-Og -ggdb ${EXTRA_CFLAGS}" \
+ CXXFLAGS="-Og -ggdb ${EXTRA_CXXFLAGS}" \
+ LDFLAGS="${EXTRA_LDFLAGS}"
EOF
build_script: su postgres -c "make -s -j${BUILD_JOBS} world-bin"
upload_caches: ccache
--
2.35.1.354.g715d08a9e5
Andres Freund <andres@anarazel.de> writes:
I tried to run postgres with ubsan to debug something.
For 0001, could we just replace configure's dlopen check with the
dlsym check? Or are you afraid of reverse-case failures?
0002: ugh, but my only real complaint is that __ubsan_default_options
needs more than zero comment. Also, it's not "our" getenv is it?
0003: OK. Interesting though that we haven't seen these before.
0004: no opinion
regards, tom lane
Hi,
On 2022-03-23 13:54:50 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I tried to run postgres with ubsan to debug something.
For 0001, could we just replace configure's dlopen check with the
dlsym check? Or are you afraid of reverse-case failures?
Yea, I was worried about that. But now that I think more about it, it's hard
to believe something could provide / intercept dlsym but not dlopen. I guess
we can try and see?
0002: ugh, but my only real complaint is that __ubsan_default_options
needs more than zero comment.
Yea, definitely. I am still hoping that somebody could see a better approach
than that ugly hack.
Haven't yet checked, but probably should also verify asan either doesn't have
the same problem or provide the same hack for ASAN_OPTIONS.
Also, it's not "our" getenv is it?
Not really. "libc's getenv()"?
0003: OK. Interesting though that we haven't seen these before.
I assume it's a question of library version and configure flags.
Looks like the fwrite nonnull case isn't actually due to the nonnull
attribute, but just fwrite() getting intercepted by the sanitizer
library. Looks like that was added starting in gcc 9 [1]5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1151) #if SANITIZER_INTERCEPT_FWRITE 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1152) INTERCEPTOR(SIZE_T, fwrite, const void *p, uptr size, uptr nmemb, void *file) { 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1153) // libc file streams can call user-supplied functions, see fopencookie. 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1154) void *ctx; 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1155) COMMON_INTERCEPTOR_ENTER(ctx, fwrite, p, size, nmemb, file); 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1156) SIZE_T res = REAL(fwrite)(p, size, nmemb, file); 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1157) if (res > 0) COMMON_INTERCEPTOR_READ_RANGE(ctx, p, res * size); 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1158) return res; 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1159) }
And the guc.c case presumably requires --enable-nls and a version of gettext
using the nonnull attribute?
Wonder if there's a few functions we should add nonnull to ourselves. Probably
would help "everyday compiler warnings", static analyzers, and ubsan.
Greetings,
Andres Freund
[1]: 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1151) #if SANITIZER_INTERCEPT_FWRITE 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1152) INTERCEPTOR(SIZE_T, fwrite, const void *p, uptr size, uptr nmemb, void *file) { 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1153) // libc file streams can call user-supplied functions, see fopencookie. 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1154) void *ctx; 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1155) COMMON_INTERCEPTOR_ENTER(ctx, fwrite, p, size, nmemb, file); 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1156) SIZE_T res = REAL(fwrite)(p, size, nmemb, file); 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1157) if (res > 0) COMMON_INTERCEPTOR_READ_RANGE(ctx, p, res * size); 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1158) return res; 5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1159) }
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1151) #if SANITIZER_INTERCEPT_FWRITE
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1152) INTERCEPTOR(SIZE_T, fwrite, const void *p, uptr size, uptr nmemb, void *file) {
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1153) // libc file streams can call user-supplied functions, see fopencookie.
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1154) void *ctx;
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1155) COMMON_INTERCEPTOR_ENTER(ctx, fwrite, p, size, nmemb, file);
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1156) SIZE_T res = REAL(fwrite)(p, size, nmemb, file);
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1157) if (res > 0) COMMON_INTERCEPTOR_READ_RANGE(ctx, p, res * size);
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1158) return res;
5d3805fca3e9 (Jakub Jelinek 2017-10-19 13:23:59 +0200 1159) }
$ git describe --tags 5d3805fca3e9
basepoints/gcc-8-3961-g5d3805fca3e
Hi,
On 2022-03-23 11:21:37 -0700, Andres Freund wrote:
On 2022-03-23 13:54:50 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I tried to run postgres with ubsan to debug something.
For 0001, could we just replace configure's dlopen check with the
dlsym check? Or are you afraid of reverse-case failures?Yea, I was worried about that. But now that I think more about it, it's hard
to believe something could provide / intercept dlsym but not dlopen. I guess
we can try and see?0003: OK. Interesting though that we haven't seen these before.
I think we should backpatch both, based on the reasoning in
46ab07ffda9d6c8e63360ded2d4568aa160a7700 ?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
I think we should backpatch both, based on the reasoning in
46ab07ffda9d6c8e63360ded2d4568aa160a7700 ?
Yeah, I suppose. Is anyone going to step up and run a buildfarm
member with ubsan enabled? (I'm already checking -fsanitize=alignment
on longfin, but it seems advisable to keep that separate from
-fsanitize=undefined.)
regards, tom lane
Hi,
On 2022-03-23 15:58:09 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I think we should backpatch both, based on the reasoning in
46ab07ffda9d6c8e63360ded2d4568aa160a7700 ?Yeah, I suppose. Is anyone going to step up and run a buildfarm
member with ubsan enabled? (I'm already checking -fsanitize=alignment
on longfin, but it seems advisable to keep that separate from
-fsanitize=undefined.)
I'm planning to enable it on two of mine. Looks like gcc and clang find
slightly different things, so I was intending to enable it on one of each.
Greetings,
Andres Freund
Hi,
On 2022-03-23 13:12:34 -0700, Andres Freund wrote:
I'm planning to enable it on two of mine. Looks like gcc and clang find
slightly different things, so I was intending to enable it on one of each.
Originally I'd planned to mix them into existing members, but I think it'd be
better to have dedicated ones. Applied for a few new buildfarm names for:
{gcc,clang}-{-fsanitize=undefined,-fsanitize=address}.
Running with asan found an existing use-after-free bug in pg_waldump (*), a bug in
dshash_seq_next() next that probably can't be hit in HEAD and a bug in my
shared memory stats patch. I count that as a success.
It's particularly impressive that the cost of running with ASAN is *so* much
lower than valgrind. On my workstation a check-world with
-fsanitize=alignment,undefined,address takes 3min17s, vs 1min10s or so without
-fsanitize. Not something to always use, but certainly better than valgrind.
Greetings,
Andres Freund
(*) search_directory() uses fname = xlde->d_name after closedir(). Found in
pg_verifybackup.c's tests. Probably worth adding a few simple tests to
pg_waldump itself.
Andres Freund <andres@anarazel.de> writes:
Running with asan found an existing use-after-free bug in pg_waldump (*), a bug in
dshash_seq_next() next that probably can't be hit in HEAD and a bug in my
shared memory stats patch. I count that as a success.
Nice!
regards, tom lane
On Wed, Mar 23, 2022 at 03:58:09PM -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I think we should backpatch both, based on the reasoning in
46ab07ffda9d6c8e63360ded2d4568aa160a7700 ?Yeah, I suppose. Is anyone going to step up and run a buildfarm
member with ubsan enabled?
thorntail has been running with UBSan since 2019. I've removed flag
-fno-sanitize=nonnull-attribute, which your changes rendered superfluous.
On 3/23/22 16:55, Andres Freund wrote:
It's particularly impressive that the cost of running with ASAN is *so* much
lower than valgrind. On my workstation a check-world with
-fsanitize=alignment,undefined,address takes 3min17s, vs 1min10s or so without
-fsanitize. Not something to always use, but certainly better than valgrind.
It also catches things that valgrind does not so that's a bonus.
One thing to note, though. I have noticed that when enabling
-fsanitize=undefined and/or -fsanitize=address in combination with
-fprofile-arcs -ftest-coverage there is a loss in reported coverage, at
least on gcc 9.3. This may not be very obvious unless coverage is
normally at 100%.
Regards,
-David
Hi,
On 2022-03-23 15:55:28 -0700, Andres Freund wrote:
Originally I'd planned to mix them into existing members, but I think it'd be
better to have dedicated ones. Applied for a few new buildfarm names for:
{gcc,clang}-{-fsanitize=undefined,-fsanitize=address}.
They're now enabled...
tamandua: gcc, -fsanitize=undefined,alignment
kestrel: clang, -fsanitize=undefined,alignment
grassquit: gcc, -fsanitize=address
olingo: clang, -fsanitize=address
The first three have started reporting in, the last is starting its first run
now.
Greetings,
Andres Freund
Hi,
On 2022-03-23 13:54:50 -0400, Tom Lane wrote:
0002: ugh, but my only real complaint is that __ubsan_default_options
needs more than zero comment. Also, it's not "our" getenv is it?0004: no opinion
Attached is a rebased version of this patch. Hopefully with a reasonable
amount of comments? I kind of wanted to add a comment to reached_main, but it
just seems to end up restating the variable name...
Greetings,
Andres Freund
Attachments:
v2-0001-Add-a-hacky-workaround-to-make-ubsan-and-ps_statu.patchtext/x-diff; charset=us-asciiDownload
From 641f77e53c86ad7eaa15138984ce39471689b1d9 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 23 Mar 2022 09:57:19 -0700
Subject: [PATCH v2 1/3] Add a hacky workaround to make ubsan and ps_status.c
compatible
At least on linux set_ps_display() breaks /proc/$pid/environ. The sanitizer
library uses /proc/$pid/environ to implement getenv() because it wants to work
independent of libc. When just using undefined and alignment sanitizers, the
sanitizer library is only initialized when the first error occurs, by which
time we've often already called set_ps_display(), preventing the sanitizer
libraries from seeing the options.
We can work around that by defining __ubsan_default_options, a weak symbol
libsanitizer uses to get defaults from the application, and return
getenv("UBSAN_OPTIONS"). But only if main already was reached, so that we
don't end up relying on a not-yet-working getenv().
As it's just a function that won't get called when not running a sanitizer, it
doesn't seem necessary to make compilation of the function conditional.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20220323173537.ll7klrglnp4gn2um@alap3.anarazel.de
---
src/backend/main/main.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index f5da4260a13..16bcf045ae6 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -43,6 +43,7 @@
const char *progname;
+static bool reached_main = false;
static void startup_hacks(const char *progname);
@@ -59,6 +60,8 @@ main(int argc, char *argv[])
{
bool do_check_root = true;
+ reached_main = true;
+
/*
* If supported on the current platform, set up a handler to be called if
* the backend/postmaster crashes with a fatal signal or exception.
@@ -421,3 +424,27 @@ check_root(const char *progname)
}
#endif /* WIN32 */
}
+
+/*
+ * At least on linux set_ps_display() breaks /proc/$pid/environ. The sanitizer
+ * library uses /proc/$pid/environ to implement getenv() because it wants to
+ * work independent of libc. When just using undefined and alignment
+ * sanitizers, the sanitizer library is only initialized when the first error
+ * occurs, by which time we've often already called set_ps_display(),
+ * preventing the sanitizer libraries from seeing the options.
+ *
+ * We can work around that by defining __ubsan_default_options, a weak symbol
+ * libsanitizer uses to get defaults from the application, and return
+ * getenv("UBSAN_OPTIONS"). But only if main already was reached, so that we
+ * don't end up relying on a not-yet-working getenv().
+ */
+const char *__ubsan_default_options(void);
+const char *
+__ubsan_default_options(void)
+{
+ /* don't call libc before it's initialized */
+ if (!reached_main)
+ return "";
+
+ return getenv("UBSAN_OPTIONS");
+}
--
2.37.3.542.gdd3f6c4cae
v2-0002-ci-use-fsanitize-undefined-alignment-in-linux-tas.patchtext/x-diff; charset=us-asciiDownload
From e081cd01fba0a860a9cb3c0e4c9525e1c4146e57 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 29 Sep 2022 17:44:45 -0700
Subject: [PATCH v2 2/3] ci: use -fsanitize=undefined,alignment in linux tasks
---
.cirrus.yml | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/.cirrus.yml b/.cirrus.yml
index 7b5cb021027..c329a2befe4 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -169,6 +169,19 @@ task:
LINUX_CONFIGURE_FEATURES: *LINUX_CONFIGURE_FEATURES
LINUX_MESON_FEATURES: *LINUX_MESON_FEATURES
+ # Enable a reasonable set of sanitizers. Use the linux task for that, as
+ # it currently is the fastest task. Also several of the sanitizers work
+ # best on linux.
+ # The overhead of alignment sanitizer is low, undefined behaviour has
+ # moderate overhead. address sanitizer howerever is pretty expensive and
+ # thus not enabled.
+ #
+ # disable_coredump=0, abort_on_error=1: for useful backtraces in case of crashes
+ # print_stacktraces=1,verbosity=2, duh
+ # detect_leaks=0: too many uninteresting leak errors in short-lived binaries
+ UBSAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:verbosity=2
+ ASAN_OPTIONS: print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0
+
only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*linux.*'
compute_engine_instance:
@@ -230,6 +243,10 @@ task:
- name: Linux - Debian Bullseye - Meson
configure_script: |
+ SANITIZER_FLAGS="-fsanitize=alignment,undefined -fno-sanitize-recover=all -fno-common"
+ CFLAGS="$SANITIZER_FLAGS $CFLAGS"
+ LDFLAGS="$SANITIZER_FLAGS $LDFLAGS"
+
su postgres <<-EOF
meson setup \
--buildtype=debug \
--
2.37.3.542.gdd3f6c4cae
Hi,
On 2022-09-29 18:17:55 -0700, Andres Freund wrote:
Attached is a rebased version of this patch. Hopefully with a reasonable
amount of comments? I kind of wanted to add a comment to reached_main, but it
just seems to end up restating the variable name...
I've now pushed a version of this with a few cleanups, mostly in
.cirrus.yml. It'll be interesting to see how many additional problems
it finds via cfbot.
Greetings,
Andres Freund
On Mon, Nov 21, 2022 at 03:15:03PM -0800, Andres Freund wrote:
Hi,
On 2022-09-29 18:17:55 -0700, Andres Freund wrote:
Attached is a rebased version of this patch. Hopefully with a reasonable
amount of comments? I kind of wanted to add a comment to reached_main, but it
just seems to end up restating the variable name...I've now pushed a version of this with a few cleanups, mostly in
Thanks. I'd meant to ask if there's a reason why you didn't use
meson -D sanitize ?
I recall seeing a bug which affected linking .. maybe that's why ...
--
Justin
Hi,
On November 21, 2022 3:42:38 PM PST, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Nov 21, 2022 at 03:15:03PM -0800, Andres Freund wrote:
Hi,
On 2022-09-29 18:17:55 -0700, Andres Freund wrote:
Attached is a rebased version of this patch. Hopefully with a reasonable
amount of comments? I kind of wanted to add a comment to reached_main, but it
just seems to end up restating the variable name...I've now pushed a version of this with a few cleanups, mostly in
Thanks. I'd meant to ask if there's a reason why you didn't use
meson -D sanitize ?I recall seeing a bug which affected linking .. maybe that's why ...
Doesn't allow enabling multiple sanitizers (the PR for that might recently have been merged, but that doesn't help us yet). We also need to add the no-recover flag anyway. So there doesn't seem to be much point.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hello Andres,
22.11.2022 02:15, Andres Freund wrote:
Hi,
On 2022-09-29 18:17:55 -0700, Andres Freund wrote:
Attached is a rebased version of this patch. Hopefully with a reasonable
amount of comments? I kind of wanted to add a comment to reached_main, but it
just seems to end up restating the variable name...I've now pushed a version of this with a few cleanups, mostly in
.cirrus.yml. It'll be interesting to see how many additional problems
it finds via cfbot.
I've just discovered that that function __ubsan_default_options() is
incompatible with -fsanitize=hwaddress:
$ tmp_install/usr/local/pgsql/bin/postgres
Segmentation fault
Program received signal SIGSEGV, Segmentation fault.
0x000000555639e3ec in __hwasan_check_x0_0 ()
(gdb) bt
#0 0x000000555639e3ec in __hwasan_check_x0_0 ()
#1 0x000000555697b5a8 in __ubsan_default_options () at main.c:446
#2 0x0000005556367e48 in InitializeFlags ()
at /home/builder/.termux-build/libllvm/src/compiler-rt/lib/hwasan/hwasan.cpp:133
#3 __hwasan_init () at /home/builder/.termux-build/libllvm/src/compiler-rt/lib/hwasan/hwasan.cpp:351
#4 0x0000007ff7f4929c in __dl__ZL13call_functionPKcPFviPPcS2_ES0_ () from /system/bin/linker64
#5 0x0000007ff7f4900c in __dl__ZL10call_arrayIPFviPPcS1_EEvPKcPT_mbS5_ () from /system/bin/linker64
#6 0x0000007ff7f45670 in __dl__ZL29__linker_init_post_relocationR19KernelArgumentBlockR6soinfo ()
from /system/bin/linker64
#7 0x0000007ff7f449c8 in __dl___linker_init () from /system/bin/linker64
#8 0x0000007ff7f4b208 in __dl__start () from /system/bin/linker64
I use clang version 16.0.6, Target: aarch64-unknown-linux-android24.
With just 'return ""' in __ubsan_default_options(), I've managed to run
`make check` (there is also an issue with check_stack_depth(),
but that's another story)...
Best regards,
Alexander
Alexander Lakhin <exclusion@gmail.com> writes:
I've just discovered that that function __ubsan_default_options() is
incompatible with -fsanitize=hwaddress:
$ tmp_install/usr/local/pgsql/bin/postgres
Segmentation fault
Hi Alexander, I wonder if you can still reproduce this problem,
and if so whether the patch proposed at [1]/messages/by-id/707073.1762305575@sss.pgh.pa.us fixes it.
regards, tom lane
Hello Tom,
05.11.2025 03:34, Tom Lane wrote:
Alexander Lakhin <exclusion@gmail.com> writes:
I've just discovered that that function __ubsan_default_options() is
incompatible with -fsanitize=hwaddress:
$ tmp_install/usr/local/pgsql/bin/postgres
Segmentation faultHi Alexander, I wonder if you can still reproduce this problem,
and if so whether the patch proposed at [1] fixes it.regards, tom lane
I don't have that device handy, but I've tried -fsanitize=hwaddress on two
ARM servers available, and despite other issues I've encountered, I could
reproduce the problem with __ubsan_default_options() on current master:
Program received signal SIGSEGV, Segmentation fault.
0x0000aaaaaed06e1c in __ubsan_default_options () at main.c:507
507 if (!reached_main)
(gdb) bt
#0 0x0000aaaaaed06e1c in __ubsan_default_options () at main.c:507
#1 0x0000aaaaac489d84 in __hwasan_init ()
#2 0x0000fffff7fc25c8 in _dl_init (main_map=0xfffff7fff380, argc=1, argv=0xfffffffff3f8, env=0xfffffffff408) at
./elf/dl-init.c:106
#3 0x0000fffff7fd8d38 in _start () at ../sysdeps/aarch64/dl-start.S:46
The proposed patch really eliminates it. Thank you for recalling this!
Best regards,
Alexander