Improving asan/ubsan support

Started by Alexander Lakhinabout 2 years ago3 messages
#1Alexander Lakhin
exclusion@gmail.com

[ this thread separated from [1]/messages/by-id/CWTLB2WWVJJ2.2YV6ERNOL1WVF@neon.tech as the discussion focus shifted ]

H Andres,

29.11.2023 22:39, Andres Freund wrote:

I use the following:
ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\
disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\
strict_init_order=1:detect_stack_use_after_return=0

I wonder if we should add some of these options by default ourselves. We could
e.g. add something like the __ubsan_default_options() in
src/backend/main/main.c to src/port/... instead, and return a combination of
"our" options (like detect_leaks=0) and the ones from the environment.

I think that such explicit expression of the project policy regarding
sanitizer checks is for good, but I see some obstacles on this way.

First, I'm not sure what to do with new useful options/maybe new option
values, that will appear in sanitizers eventually. Should the only options,
that are supported by all sanitizers' versions, be specified, or we may
expect that unsupported options/values would be ignored by old versions?

Second, what to do with other binaries, that need detect_leaks=0, for
example, that same ecpg?

ISTM that, if it actually works as I theorize it should, using
__attribute__((no_sanitize("address"))) would be the easiest approach
here. Something like

#if defined(__has_feature) && __has_feature(address_sanitizer)
#define pg_attribute_no_asan __attribute__((no_sanitize("address")))
#else
#define pg_attribute_no_asan
#endif

or such should work.

I've tried adding:
 bool
+__attribute__((no_sanitize("address")))
 stack_is_too_deep(void)

and it does work got me with clang 15, 18: `make check-world` passes with
ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\
disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\
strict_init_order=1:detect_stack_use_after_return=1
UBSAN_OPTIONS=abort_on_error=1:print_stacktrace=1

(with a fix for pg_bsd_indent applied [2]/messages/by-id/591971ce-25c1-90f3-0526-5f54e3ebb32e@gmail.com)

But with gcc 11, 12, 13 I get an assertion failure during `make check`:
#4  0x00007fabadcd67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000557f35260382 in ExceptionalCondition (conditionName=0x557f35ca51a0 "(uintptr_t) buffer ==
TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x557f35ca4fc0 "md.c", lineNumber=471) at assert.c:66
#6  0x0000557f34a3b2bc in mdextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020,
skipFsync=true) at md.c:471
#7  0x0000557f34a45a6f in smgrextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020,
skipFsync=true) at smgr.c:501
#8  0x0000557f349139ed in RelationCopyStorageUsingBuffer (srclocator=..., dstlocator=..., forkNum=MAIN_FORKNUM,
permanent=true) at bufmgr.c:4386

The buffer (buf) declared as follows:
static void
RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
                               RelFileLocator dstlocator,
                               ForkNumber forkNum, bool permanent)
{
...
    PGIOAlignedBlock buf;
...

But as we can see, the buffer address is really not 4k-aligned, and that
offset 0x20 added in run-time only when the server started with
detect_stack_use_after_return=1.
So it looks like the asan feature detect_stack_use_after_return implemented
in gcc allows itself to add some data on stack, that breaks our alignment
expectations. With all three such Asserts in md.c removed,
`make check-world` passes for me.

One thing that's been holding me back on trying to do something around this is
the basically non-existing documentation around all of this. I haven't even
found documentation referencing the fact that there are headers like
sanitizer/asan_interface.h, you just have to figure that out yourself. Compare
that to something like valgrind, which has documented this at least somewhat.

Yes, so maybe it's reasonable to support only basic/common features (such
as detect_leaks), leaving advanced ones for ad-hoc usage till they prove
their worthiness.

Best regards,
Alexander

[1]: /messages/by-id/CWTLB2WWVJJ2.2YV6ERNOL1WVF@neon.tech
[2]: /messages/by-id/591971ce-25c1-90f3-0526-5f54e3ebb32e@gmail.com

#2Tristan Partin
tristan@neon.tech
In reply to: Alexander Lakhin (#1)
Re: Improving asan/ubsan support

On Fri Dec 1, 2023 at 3:00 AM CST, Alexander Lakhin wrote:

[ this thread separated from [1] as the discussion focus shifted ]

H Andres,

29.11.2023 22:39, Andres Freund wrote:

I use the following:
ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\
disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\
strict_init_order=1:detect_stack_use_after_return=0

I wonder if we should add some of these options by default ourselves. We could
e.g. add something like the __ubsan_default_options() in
src/backend/main/main.c to src/port/... instead, and return a combination of
"our" options (like detect_leaks=0) and the ones from the environment.

I think that such explicit expression of the project policy regarding
sanitizer checks is for good, but I see some obstacles on this way.

First, I'm not sure what to do with new useful options/maybe new option
values, that will appear in sanitizers eventually. Should the only options,
that are supported by all sanitizers' versions, be specified, or we may
expect that unsupported options/values would be ignored by old versions?

Second, what to do with other binaries, that need detect_leaks=0, for
example, that same ecpg?

ISTM that, if it actually works as I theorize it should, using
__attribute__((no_sanitize("address"))) would be the easiest approach
here. Something like

#if defined(__has_feature) && __has_feature(address_sanitizer)
#define pg_attribute_no_asan __attribute__((no_sanitize("address")))
#else
#define pg_attribute_no_asan
#endif

or such should work.

I've tried adding:
 bool
+__attribute__((no_sanitize("address")))
 stack_is_too_deep(void)

and it does work got me with clang 15, 18: `make check-world` passes with
ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:print_stacktrace=1:\
disable_coredump=0:strict_string_checks=1:check_initialization_order=1:\
strict_init_order=1:detect_stack_use_after_return=1
UBSAN_OPTIONS=abort_on_error=1:print_stacktrace=1

(with a fix for pg_bsd_indent applied [2])

But with gcc 11, 12, 13 I get an assertion failure during `make check`:
#4  0x00007fabadcd67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000557f35260382 in ExceptionalCondition (conditionName=0x557f35ca51a0 "(uintptr_t) buffer ==
TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x557f35ca4fc0 "md.c", lineNumber=471) at assert.c:66
#6  0x0000557f34a3b2bc in mdextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020,
skipFsync=true) at md.c:471
#7  0x0000557f34a45a6f in smgrextend (reln=0x6250000375c8, forknum=MAIN_FORKNUM, blocknum=18, buffer=0x7fabaa800020,
skipFsync=true) at smgr.c:501
#8  0x0000557f349139ed in RelationCopyStorageUsingBuffer (srclocator=..., dstlocator=..., forkNum=MAIN_FORKNUM,
permanent=true) at bufmgr.c:4386

The buffer (buf) declared as follows:
static void
RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
                               RelFileLocator dstlocator,
                               ForkNumber forkNum, bool permanent)
{
...
    PGIOAlignedBlock buf;
...

But as we can see, the buffer address is really not 4k-aligned, and that
offset 0x20 added in run-time only when the server started with
detect_stack_use_after_return=1.
So it looks like the asan feature detect_stack_use_after_return implemented
in gcc allows itself to add some data on stack, that breaks our alignment
expectations. With all three such Asserts in md.c removed,
`make check-world` passes for me.

Decided to do some digging into this, and Google actually documents[0]https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterReturn#algorithm
how it works. After reading the algorithm, it is obvious why this fails.
What happens if you throw an __attribute__((no_sanitize("address")) on
the function? I assume the Asserts would then pass. The commit[1]https://github.com/postgres/postgres/commit/faeedbcefd40bfdf314e048c425b6d9208896d90 which
added pg_attribute_aligned() provides insight as to why the Asserts
exist.

/* If this build supports direct I/O, the buffer must be I/O aligned. */

Disabling instrumentation in functions which use this specific type when
the build supports direct IO seems like the best solution.

One thing that's been holding me back on trying to do something around this is
the basically non-existing documentation around all of this. I haven't even
found documentation referencing the fact that there are headers like
sanitizer/asan_interface.h, you just have to figure that out yourself. Compare
that to something like valgrind, which has documented this at least somewhat.

Yes, so maybe it's reasonable to support only basic/common features (such
as detect_leaks), leaving advanced ones for ad-hoc usage till they prove
their worthiness.

Possibly, but I think I would rather see upstream support running with
all features with instrumentation turned off in various sections of
code. Even some assistance from AddressSanitizer is better than none.
Here[1]https://github.com/postgres/postgres/commit/faeedbcefd40bfdf314e048c425b6d9208896d90[2]https://github.com/google/sanitizers/wiki/AddressSanitizerFlags are all the AddressSanitizer flags for those curious.

Best regards,
Alexander

[1] /messages/by-id/CWTLB2WWVJJ2.2YV6ERNOL1WVF@neon.tech
[2] /messages/by-id/591971ce-25c1-90f3-0526-5f54e3ebb32e@gmail.com

I personally would like to see Postgres have support for
AddressSanitizer. I think it already supports UndefinedBehaviorSanitizer
if I am remembering the buildfarm properly. AddressSanitizer has been so
helpful in past experiences writing C.

[0]: https://github.com/google/sanitizers/wiki/AddressSanitizerUseAfterReturn#algorithm
[1]: https://github.com/postgres/postgres/commit/faeedbcefd40bfdf314e048c425b6d9208896d90
[2]: https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
[3]: https://github.com/google/sanitizers/wiki/SanitizerCommonFlags

--
Tristan Partin
Neon (https://neon.tech)

#3Alexander Lakhin
exclusion@gmail.com
In reply to: Tristan Partin (#2)
Re: Improving asan/ubsan support

Hello Tristan,

02.12.2023 00:48, Tristan Partin wrote:

So it looks like the asan feature detect_stack_use_after_return implemented
in gcc allows itself to add some data on stack, that breaks our alignment
expectations. With all three such Asserts in md.c removed,
`make check-world` passes for me.

Decided to do some digging into this, and Google actually documents[0] how it works. After reading the algorithm, it
is obvious why this fails. What happens if you throw an __attribute__((no_sanitize("address")) on the function? I
assume the Asserts would then pass. The commit[1] which added pg_attribute_aligned() provides insight as to why the
Asserts exist.

Thank you for spending your time on this!

Yes, I understand what those Asserts were added for, I removed them just
to check what else is on the way.
And I can confirm that marking that function with the no_sanitize attribute
fixes that exact failure. Then the same attribute has to be added to
_hash_alloc_buckets(), to prevent:
TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 471, PID: 1766976

#5  0x00005594a6a0f0d0 in ExceptionalCondition (conditionName=0x5594a7454a60 "(uintptr_t) buffer ==
TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x5594a7454880 "md.c", lineNumber=471) at assert.c:66
#6  0x00005594a61ce133 in mdextend (reln=0x625000037e48, forknum=MAIN_FORKNUM, blocknum=9, buffer=0x7fc3b3947020,
skipFsync=false) at md.c:471
#7  0x00005594a61d89ab in smgrextend (reln=0x625000037e48, forknum=MAIN_FORKNUM, blocknum=9, buffer=0x7fc3b3947020,
skipFsync=false) at smgr.c:501
#8  0x00005594a4a0c43d in _hash_alloc_buckets (rel=0x7fc3a89714f8, firstblock=6, nblocks=4) at hashpage.c:1033

And to RelationCopyStorage(), to prevent:
TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 752, PID: 1787855

#5  0x000056081d5688bc in ExceptionalCondition (conditionName=0x56081dfaea40 "(uintptr_t) buffer ==
TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)", fileName=0x56081dfae860 "md.c", lineNumber=752) at assert.c:66
#6  0x000056081cd29415 in mdread (reln=0x629000043158, forknum=MAIN_FORKNUM, blocknum=0, buffer=0x7fe480633020) at md.c:752
#7  0x000056081cd32cb3 in smgrread (reln=0x629000043158, forknum=MAIN_FORKNUM, blocknum=0, buffer=0x7fe480633020) at
smgr.c:565
#8  0x000056081b9ed5f2 in RelationCopyStorage (src=0x629000043158, dst=0x629000041248, forkNum=MAIN_FORKNUM,
relpersistence=112 'p') at storage.c:487

Probably, it has to be added for all the functions where PGIOAlignedBlock
located on stack...

But I still wonder, how it works with clang, why that extra attribute is
not required?
In other words, such implementation specifics discourage me...

Possibly, but I think I would rather see upstream support running with all features with instrumentation turned off in
various sections of code. Even some assistance from AddressSanitizer is better than none. Here[1][2] are all the
AddressSanitizer flags for those curious.

Yeah, and you might also need to specify extra flags to successfully run
postgres with newer sanitizers' versions. Say, for clang-18 you need to
specify -fno-sanitize=function (which is not recognized by gcc 13.2), to
avoid errors like this:
running bootstrap script ... dynahash.c:1120:4: runtime error: call to function strlcpy through pointer to incorrect
function type 'void *(*)(void *, const void *, unsigned long)'
.../src/port/strlcpy.c:46: note: strlcpy defined here
    #0 0x556af5e0b0a9 in hash_search_with_hash_value .../src/backend/utils/hash/dynahash.c:1120:4
    #1 0x556af5e08f4f in hash_search .../src/backend/utils/hash/dynahash.c:958:9

I personally would like to see Postgres have support for  AddressSanitizer. I think it already supports
UndefinedBehaviorSanitizer if I am remembering the buildfarm properly. AddressSanitizer has been so helpful in past
experiences writing C.

Me too. I find it very valuable for my personal usage but I'm afraid it's
still not very stable/mature.
One more example. Just adding -fsanitize=undefined for gcc 12, 13 (I tried
12.1, 13.0, 13.2) produces new warnings like:
In function 'PageGetItemId',
    inlined from 'heap_xlog_update' at heapam.c:9569:9:
../../../../src/include/storage/bufpage.h:243:16: warning: array subscript -1 is below array bounds of 'ItemIdData[]'
[-Warray-bounds=]
  243 |         return &((PageHeader) page)->pd_linp[offsetNumber - 1];
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

But I don't get such warnings when I use gcc 11.3 (though it generates
other ones) or clang (15, 18). They also aren't produced with -O0, -O1...
Maybe it's another gcc bug, I'm not sure how to deal with it.
(I can research this issue, if it makes any sense.)

So I would say that cost of providing/maintaining full support for asan
(hwasan), ubsan is not near zero, unfortunately. I would estimate it to
10-20 discussions/commits on start/5-10 per year later (not including fixes
for bugs that would be found). If it's affordable for the project, I'd like
to have such support out-of-the-box.

Best regards,
Alexander