Fix some ubsan/asan related issues
Patch 1:
Passing NULL as a second argument to memcpy breaks ubsan, and there
didn't seem to be anything preventing that in the LogLogicalMessage()
codepath. Here is a preventative measure in LogLogicalMessage() and an
Assert() in CopyXLogRecordToWAL().
Patch 2:
Support building with -Db_sanitize=address in Meson. Various executables
are leaky which can cause the builds to fail and tests to fail, when we
are fine leaking this memory.
Personally, I am a big stickler for always freeing my memory in
executables even if it gets released at program exit because it makes
the leak sanitizer much more effective. However (!), I am not here to
change the philosophy of memory management in one-off executables, so
I have just silenced memory leaks in various executables for now.
Patch 3:
THIS DOESN'T WORK. DO NOT COMMIT. PROPOSING FOR DISCUSSION!
In my effort to try to see if the test suite would pass with asan
enabled, I ran into a max_stack_depth issue. I tried maxing it out
(hence, the patch), but that still didn't remedy my issue. I tried to
look on the list for any relevant emails, but nothing turned up. Maybe
others have not attempted this before? Seems doubtful.
Not entirely sure how to fix this issue. I personally find asan
extremely effective, even more than valgrind, so it would be great if
I could run Postgres with asan enabled to catch various stupid C issues
I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
seem to leave enough stack space for Postgres.
---
I would like to see patch 1 reviewed and committed. Patch 2 honestly
doesn't matter unless asan support can be fixed. I can also add a patch
that errors out the Meson build if asan support is requested. That way
others don't spend time heading down a dead end.
--
Tristan Partin
Neon (https://neon.tech)
Spend so much time writing out the email, I once again forget
attachments...UGH.
--
Tristan Partin
Neon (https://neon.tech)
Attachments:
v1-0001-Refuse-to-register-message-in-LogLogicalMessage-i.patchtext/x-patch; charset=utf-8; name=v1-0001-Refuse-to-register-message-in-LogLogicalMessage-i.patchDownload+3-2
v1-0002-meson-Support-compiling-with-Db_sanitize-address.patchtext/x-patch; charset=utf-8; name=v1-0002-meson-Support-compiling-with-Db_sanitize-address.patchDownload+50-1
v1-0003-Max-out-max_stack_depth-when-asan-is-enabled.patchtext/x-patch; charset=utf-8; name=v1-0003-Max-out-max_stack_depth-when-asan-is-enabled.patchDownload+4-1
On 30/01/2024 17:57, Tristan Partin wrote:
Patch 1:
Passing NULL as a second argument to memcpy breaks ubsan, and there
didn't seem to be anything preventing that in the LogLogicalMessage()
codepath. Here is a preventative measure in LogLogicalMessage() and an
Assert() in CopyXLogRecordToWAL().
For the audience: We ran into this one with the neon extension. The
solution is to call LogLogicalMessage("", 0, ...) instead of
LogLogicalMessage(NULL, 0, ...). . But it's true that it's pointlessfor
LogLogicalMessage to call XLogRegisterData() if the message is empty. If
we do this, we should check for 'size == 0' rather than 'message == NULL'.
Patch 2:
Support building with -Db_sanitize=address in Meson. Various executables
are leaky which can cause the builds to fail and tests to fail, when we
are fine leaking this memory.Personally, I am a big stickler for always freeing my memory in
executables even if it gets released at program exit because it makes
the leak sanitizer much more effective. However (!), I am not here to
change the philosophy of memory management in one-off executables, so
I have just silenced memory leaks in various executables for now.Patch 3:
THIS DOESN'T WORK. DO NOT COMMIT. PROPOSING FOR DISCUSSION!
In my effort to try to see if the test suite would pass with asan
enabled, I ran into a max_stack_depth issue. I tried maxing it out
(hence, the patch), but that still didn't remedy my issue. I tried to
look on the list for any relevant emails, but nothing turned up. Maybe
others have not attempted this before? Seems doubtful.Not entirely sure how to fix this issue. I personally find asan
extremely effective, even more than valgrind, so it would be great if
I could run Postgres with asan enabled to catch various stupid C issues
I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
seem to leave enough stack space for Postgres.
I'm a bit confused by these. We already run with sanitizer in the cirrus
CI. What does this enable that we're not already doing?
--
Heikki Linnakangas
Neon (https://neon.tech)
Hi,
On 2024-01-30 22:05:28 +0200, Heikki Linnakangas wrote:
On 30/01/2024 17:57, Tristan Partin wrote:
In my effort to try to see if the test suite would pass with asan
enabled, I ran into a max_stack_depth issue. I tried maxing it out
(hence, the patch), but that still didn't remedy my issue. I tried to
look on the list for any relevant emails, but nothing turned up. Maybe
others have not attempted this before? Seems doubtful.Not entirely sure how to fix this issue. I personally find asan
extremely effective, even more than valgrind, so it would be great if
I could run Postgres with asan enabled to catch various stupid C issues
I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
seem to leave enough stack space for Postgres.I'm a bit confused by these. We already run with sanitizer in the cirrus CI.
What does this enable that we're not already doing?
The reason asan fails is that it uses a "shadow stack" to track stack variable
lifetimes. These confuse our stack depth check. CI doesn't have the issue
because the compiler doesn't yet enable the feature, locally I get around it
by using ASAN_OPTIONS=detect_stack_use_after_return=0:...
The checks are actually quite useful, so making our stack depth check work
with asan would be worthwhile.
I discussed this in a bit more in
/messages/by-id/20231129193920.4vphw7dqxjzf5v5b@awork3.anarazel.de
Greetings,
Andres Freund
Hi,
On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
From: Tristan Partin <tristan@neon.tech>
Date: Mon, 29 Jan 2024 18:03:39 -0600
Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
NULL
If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
the API contract of memcpy in glibc. The two pointer arguments are
marked as nonnull, even in the event the amount to copy is 0 bytes.
It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
that something useful?
From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
From: Tristan Partin <tristan@neon.tech>
Date: Wed, 24 Jan 2024 17:07:01 -0600
Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=addressThe ecpg is parser is extremely leaky, so we need to silence leak
detection.
This does stuff beyond epcg...
+if get_option('b_sanitize').contains('address') + cdata.set('USE_ADDRESS_SANITIZER', 1) +endif############################################################### # NLS / Gettext diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index ac409b0006..e18e716d9c 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -338,6 +338,17 @@ do { \ output_failed = true, output_errno = errno; \ } while (0)+#ifdef USE_ADDRESS_SANITIZER
When asan is used __SANITIZE_ADDRESS__ is defined, so we don't need to
implement this ourselves.
+const char *__asan_default_options(void); + +const char *__asan_default_options(void) +{ + return "detect_leaks=0"; +} + +#endif
Wonder if we should move this into some static library and link it into all
binaries that don't want leak detection? It doesn't seem great to have to
adjust this in a bunch of files if we want to adjust the options...
Greetings,
Andres Freund
Hello,
30.01.2024 18:57, Tristan Partin wrote:
Patch 1:
Passing NULL as a second argument to memcpy breaks ubsan, ...
Maybe you would like to fix also one more similar place, reached with:
create extension xml2;
select xslt_process('<x/>',
$$<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
</xsl:template>
</xsl:stylesheet>$$);
varlena.c:201:26: runtime error: null pointer passed as argument 2, which is declared to never be null
There is also an issue with pg_bsd_indent, I stumble upon when doing
`make check-world` with the sanitizers enabled:
/messages/by-id/591971ce-25c1-90f3-0526-5f54e3ebb32e@gmail.com
31.01.2024 00:23, Andres Freund wrote:
The reason asan fails is that it uses a "shadow stack" to track stack variable
lifetimes. These confuse our stack depth check. CI doesn't have the issue
because the compiler doesn't yet enable the feature, locally I get around it
by using ASAN_OPTIONS=detect_stack_use_after_return=0:...
Even with detect_stack_use_after_return=0, clang-18's asan makes the test
012_subtransactions.pl fail:
2024-01-31 03:24:25.691 UTC [4112455] 012_subtransactions.pl LOG: statement: SELECT hs_subxids(201);
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl ERROR: stack depth limit exceeded
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl HINT: Increase the configuration parameter max_stack_depth
(currently 2048kB), after ensuring the platform's stack depth limit is adequate.
(All the other tests pass.)
Though the same test passes when I use clang-16.
Best regards,
Alexander
On Tue Jan 30, 2024 at 10:00 PM CST, Alexander Lakhin wrote:
Hello,
30.01.2024 18:57, Tristan Partin wrote:
Patch 1:
Passing NULL as a second argument to memcpy breaks ubsan, ...
Maybe you would like to fix also one more similar place, reached with:
create extension xml2;
select xslt_process('<x/>',
$$<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
</xsl:template>
</xsl:stylesheet>$$);varlena.c:201:26: runtime error: null pointer passed as argument 2, which is declared to never be null
There is also an issue with pg_bsd_indent, I stumble upon when doing
`make check-world` with the sanitizers enabled:
/messages/by-id/591971ce-25c1-90f3-0526-5f54e3ebb32e@gmail.com31.01.2024 00:23, Andres Freund wrote:
The reason asan fails is that it uses a "shadow stack" to track stack variable
lifetimes. These confuse our stack depth check. CI doesn't have the issue
because the compiler doesn't yet enable the feature, locally I get around it
by using ASAN_OPTIONS=detect_stack_use_after_return=0:...Even with detect_stack_use_after_return=0, clang-18's asan makes the test
012_subtransactions.pl fail:
2024-01-31 03:24:25.691 UTC [4112455] 012_subtransactions.pl LOG: statement: SELECT hs_subxids(201);
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl ERROR: stack depth limit exceeded
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl HINT: Increase the configuration parameter max_stack_depth
(currently 2048kB), after ensuring the platform's stack depth limit is adequate.(All the other tests pass.)
Though the same test passes when I use clang-16.
Thanks Alexander! I will try and take a look at these.
--
Tristan Partin
Neon (https://neon.tech)
On Tue Jan 30, 2024 at 3:58 PM CST, Andres Freund wrote:
Hi,
On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
From: Tristan Partin <tristan@neon.tech>
Date: Mon, 29 Jan 2024 18:03:39 -0600
Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
NULLIf this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
the API contract of memcpy in glibc. The two pointer arguments are
marked as nonnull, even in the event the amount to copy is 0 bytes.It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
that something useful?
Dropped. Will change on the Neon side. Should we add an assert
somewhere for good measure? Near the memcpy in question?
From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
From: Tristan Partin <tristan@neon.tech>
Date: Wed, 24 Jan 2024 17:07:01 -0600
Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=addressThe ecpg is parser is extremely leaky, so we need to silence leak
detection.This does stuff beyond epcg...
Dropped.
+if get_option('b_sanitize').contains('address') + cdata.set('USE_ADDRESS_SANITIZER', 1) +endif############################################################### # NLS / Gettext diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index ac409b0006..e18e716d9c 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -338,6 +338,17 @@ do { \ output_failed = true, output_errno = errno; \ } while (0)+#ifdef USE_ADDRESS_SANITIZER
When asan is used __SANITIZE_ADDRESS__ is defined, so we don't need to
implement this ourselves.
Thanks!
+const char *__asan_default_options(void); + +const char *__asan_default_options(void) +{ + return "detect_leaks=0"; +} + +#endifWonder if we should move this into some static library and link it into all
binaries that don't want leak detection? It doesn't seem great to have to
adjust this in a bunch of files if we want to adjust the options...
See attached patches. Here is what I found to be necessary to get
a -Db_sanitize=address,undefined build to successfully make it through
tests. I do have a few concerns about the patch.
1. For whatever reason, __SANITIZE_LEAK__ is not defined when the leak
sanitizer is enabled. So you will see me use this, to make some
include directives work. I don't like this as a final solution
because someone could use -fsanitize=leak.
2. I tracked down what seems to be a valid leak in adt/xml.c. Attached
(test.sql) is a fairly minimal reproduction of what is needed to show
the leak. I didn't spend too much time tracking it down. Might get to
it later, who knows. Below you will find the backtrace, and whoever
wants to try their hand at fixing it will need to comment out
xmlNewNode in the leak.supp file.
3. I don't love the new library name. Maybe it should be name more lsan
specific.
4. Should pg_attribute_no_asan be renamed to
pg_attribute_no_sanitize_address? That would match
pg_attribute_no_sanitize_alignment.
I will also attach a Meson test log for good measure. I didn't try
testing anything that requires PG_TEST_EXTRA, but I suspect that
everything will be fine.
Alexander, I haven't yet gotten to the things you pointed out in the
sibling thread.
==221848==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 120 byte(s) in 1 object(s) allocated from:
#0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
#1 0x7fac4a48427d in xmlNewNode (/lib64/libxml2.so.2+0x5d27d) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
#2 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
#3 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
#4 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
#5 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
#6 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
#7 0x109655d in ExecProject ../src/include/executor/executor.h:389
#8 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
#9 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
#10 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
#11 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
#12 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
#13 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
#14 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
#15 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
#16 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
#17 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
#18 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
#19 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
#20 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
#21 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
#22 0x11a5f67 in main ../src/backend/main/main.c:198
#23 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
#24 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
#25 0x49c5d4 in _start (/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4) (BuildId: c8ca341e1303be0f9dc0b0271c55c4b9e42af89b)
Indirect leak of 13 byte(s) in 1 object(s) allocated from:
#0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
#1 0x7fac4a4e106f in xmlStrndup (/lib64/libxml2.so.2+0xba06f) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
#2 0x7fac4a4842c0 in xmlNewNode (/lib64/libxml2.so.2+0x5d2c0) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
#3 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
#4 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
#5 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
#6 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
#7 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
#8 0x109655d in ExecProject ../src/include/executor/executor.h:389
#9 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
#10 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
#11 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
#12 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
#13 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
#14 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
#15 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
#16 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
#17 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
#18 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
#19 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
#20 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
#21 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
#22 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
#23 0x11a5f67 in main ../src/backend/main/main.c:198
#24 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
#25 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
#26 0x49c5d4 in _start (/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4) (BuildId: c8ca341e1303be0f9dc0b0271c55c4b9e42af89b)
SUMMARY: AddressSanitizer: 133 byte(s) leaked in 2 allocation(s).
--
Tristan Partin
Neon (https://neon.tech)
Attachments:
v2-0001-Add-pg_attribute_no_asan.patchtext/x-patch; charset=utf-8; name=v2-0001-Add-pg_attribute_no_asan.patchDownload+7-1
v2-0002-Fix-stack-depth-checking-with-AddressSanitizer-en.patchtext/x-patch; charset=utf-8; name=v2-0002-Fix-stack-depth-checking-with-AddressSanitizer-en.patchDownload+4-5
v2-0003-Add-libpgbincommon.patchtext/x-patch; charset=utf-8; name=v2-0003-Add-libpgbincommon.patchDownload+21-1
v2-0004-Add-instrumentation-to-disable-the-LeakSanitizer-.patchtext/x-patch; charset=utf-8; name=v2-0004-Add-instrumentation-to-disable-the-LeakSanitizer-.patchDownload+74-30
Hi Tristan,
On Tue, Feb 6, 2024 at 11:53 AM Tristan Partin <tristan@neon.tech> wrote:
On Tue Jan 30, 2024 at 3:58 PM CST, Andres Freund wrote:
Hi,
On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
From: Tristan Partin <tristan@neon.tech>
Date: Mon, 29 Jan 2024 18:03:39 -0600
Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
NULLIf this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
the API contract of memcpy in glibc. The two pointer arguments are
marked as nonnull, even in the event the amount to copy is 0 bytes.It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
that something useful?Dropped. Will change on the Neon side. Should we add an assert
somewhere for good measure? Near the memcpy in question?From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
From: Tristan Partin <tristan@neon.tech>
Date: Wed, 24 Jan 2024 17:07:01 -0600
Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=addressThe ecpg is parser is extremely leaky, so we need to silence leak
detection.This does stuff beyond epcg...
Dropped.
+if get_option('b_sanitize').contains('address') + cdata.set('USE_ADDRESS_SANITIZER', 1) +endif############################################################### # NLS / Gettext diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index ac409b0006..e18e716d9c 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -338,6 +338,17 @@ do { \ output_failed = true, output_errno = errno; \ } while (0)+#ifdef USE_ADDRESS_SANITIZER
When asan is used __SANITIZE_ADDRESS__ is defined, so we don't need to
implement this ourselves.Thanks!
+const char *__asan_default_options(void); + +const char *__asan_default_options(void) +{ + return "detect_leaks=0"; +} + +#endifWonder if we should move this into some static library and link it into all
binaries that don't want leak detection? It doesn't seem great to have to
adjust this in a bunch of files if we want to adjust the options...See attached patches. Here is what I found to be necessary to get
a -Db_sanitize=address,undefined build to successfully make it through
tests. I do have a few concerns about the patch.1. For whatever reason, __SANITIZE_LEAK__ is not defined when the leak
sanitizer is enabled. So you will see me use this, to make some
include directives work. I don't like this as a final solution
because someone could use -fsanitize=leak.
2. I tracked down what seems to be a valid leak in adt/xml.c. Attached
(test.sql) is a fairly minimal reproduction of what is needed to show
the leak. I didn't spend too much time tracking it down. Might get to
it later, who knows. Below you will find the backtrace, and whoever
wants to try their hand at fixing it will need to comment out
xmlNewNode in the leak.supp file.
3. I don't love the new library name. Maybe it should be name more lsan
specific.
4. Should pg_attribute_no_asan be renamed to
pg_attribute_no_sanitize_address? That would match
pg_attribute_no_sanitize_alignment.I will also attach a Meson test log for good measure. I didn't try
testing anything that requires PG_TEST_EXTRA, but I suspect that
everything will be fine.Alexander, I haven't yet gotten to the things you pointed out in the
sibling thread.==221848==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 120 byte(s) in 1 object(s) allocated from:
#0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
#1 0x7fac4a48427d in xmlNewNode (/lib64/libxml2.so.2+0x5d27d) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
#2 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
#3 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
#4 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
#5 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
#6 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
#7 0x109655d in ExecProject ../src/include/executor/executor.h:389
#8 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
#9 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
#10 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
#11 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
#12 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
#13 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
#14 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
#15 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
#16 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
#17 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
#18 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
#19 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
#20 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
#21 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
#22 0x11a5f67 in main ../src/backend/main/main.c:198
#23 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
#24 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
#25 0x49c5d4 in _start (/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4) (BuildId: c8ca341e1303be0f9dc0b0271c55c4b9e42af89b)Indirect leak of 13 byte(s) in 1 object(s) allocated from:
#0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
#1 0x7fac4a4e106f in xmlStrndup (/lib64/libxml2.so.2+0xba06f) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
#2 0x7fac4a4842c0 in xmlNewNode (/lib64/libxml2.so.2+0x5d2c0) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
#3 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
#4 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
#5 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
#6 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
#7 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
#8 0x109655d in ExecProject ../src/include/executor/executor.h:389
#9 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
#10 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
#11 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
#12 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
#13 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
#14 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
#15 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
#16 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
#17 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
#18 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
#19 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
#20 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
#21 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
#22 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
#23 0x11a5f67 in main ../src/backend/main/main.c:198
#24 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
#25 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId: 7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
#26 0x49c5d4 in _start (/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4) (BuildId: c8ca341e1303be0f9dc0b0271c55c4b9e42af89b)SUMMARY: AddressSanitizer: 133 byte(s) leaked in 2 allocation(s).
--
Tristan Partin
Neon (https://neon.tech)
I tried your v1-0002, it works at compile phase but failed to run initdb
with the following leak detected:
=================================================================
==64983==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 248 byte(s) in 1 object(s) allocated from:
#0 0x7fc7729df9cf in __interceptor_malloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x55bff5480e8b in save_ps_display_args
../postgres/src/backend/utils/misc/ps_status.c:190
#2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
#3 0x7fc771924249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
Indirect leak of 19 byte(s) in 1 object(s) allocated from:
#0 0x7fc77299777b in __interceptor_strdup
../../../../src/libsanitizer/asan/asan_interceptors.cpp:439
#1 0x55bff5480f41 in save_ps_display_args
../postgres/src/backend/utils/misc/ps_status.c:198
#2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
#3 0x7fc771924249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
I worked around by moving *new_environ* as a global variable, I think this is
false positive report so maybe this deserves a patch?
I tried to apply your v2 patch set but v2-0004 seems out of date.
--
Regards
Junwang Zhao
On Mon Sep 16, 2024 at 9:29 AM EDT, Junwang Zhao wrote:
I tried your v1-0002, it works at compile phase but failed to run initdb
with the following leak detected:=================================================================
==64983==ERROR: LeakSanitizer: detected memory leaksDirect leak of 248 byte(s) in 1 object(s) allocated from:
#0 0x7fc7729df9cf in __interceptor_malloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x55bff5480e8b in save_ps_display_args
../postgres/src/backend/utils/misc/ps_status.c:190
#2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
#3 0x7fc771924249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58Indirect leak of 19 byte(s) in 1 object(s) allocated from:
#0 0x7fc77299777b in __interceptor_strdup
../../../../src/libsanitizer/asan/asan_interceptors.cpp:439
#1 0x55bff5480f41 in save_ps_display_args
../postgres/src/backend/utils/misc/ps_status.c:198
#2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
#3 0x7fc771924249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58I worked around by moving *new_environ* as a global variable, I think this is
false positive report so maybe this deserves a patch?I tried to apply your v2 patch set but v2-0004 seems out of date.
Thanks for giving it a try Junwang. I'll rebase this, hopefully tomorrow
and give it another push.
--
Tristan Partin
Neon (https://neon.tech)