Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Started by Ranier Vilelaover 5 years ago27 messages
#1Ranier Vilela
ranier.vf@gmail.com

Hi,

Per Clang UBSan
Clang 10 (64 bits)
Postgres 14 (latest)

2020-08-27 01:02:14.930 -03 client backend[42432] pg_regress/create_table
STATEMENT: create table defcheck_0 partition of defcheck for values in (0);
indexcmds.c:1162:22: runtime error: null pointer passed as argument 2,
which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior indexcmds.c:1162:22
in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is
declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in

indexcmds.c (1162):
memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);

clog.c (299):
memcmp(subxids, MyProc->subxids.xids,
nsubxids * sizeof(TransactionId)) == 0)

xact.c (5285)
memcpy(&workspace[i], s->childXids,
s->nChildXids * sizeof(TransactionId));

snapmgr.c (590)
memcpy(CurrentSnapshot->xip, sourcesnap->xip,
sourcesnap->xcnt * sizeof(TransactionId));
snapmgr.c (594)
memcpy(CurrentSnapshot->subxip, sourcesnap->subxip,
sourcesnap->subxcnt * sizeof(TransactionId));

copyfuncs.c:1190
COPY_POINTER_FIELD(uniqColIdx, from->uniqNumCols * sizeof(AttrNumber));

1.STATEMENT: CREATE TABLESPACE regress_tblspacewith LOCATION
'/usr/src/postgres/src/test/regress/testtablespace' WITH
(some_nonexistent_parameter = true);
clog.c:299:10: runtime error: null pointer passed as argument 1, which is
declared to never be null
2.STATEMENT: CREATE TABLE testschema.dflt (a int PRIMARY KEY USING INDEX
TABLESPACE regress_tblspace) PARTITION BY LIST (a);
indexcmds.c:1162:22: runtime error: null pointer passed as argument 2,
which is declared to never be null
3.STATEMENT: SELECT bool 'nay' AS error;
clog.c:299:10: runtime error: null pointer passed as argument 1, which is
declared to never be null
4.STATEMENT: SELECT U&'wrong: +0061' UESCAPE '+';
clog.c:299:10: runtime error: null pointer passed as argument 1, which is
declared to never be null
5. STATEMENT: ALTER TABLE circles ADD EXCLUDE USING gist
(c1 WITH &&, (c2::circle) WITH &&);
xact.c:5285:25: runtime error: null pointer passed as argument 2, which is
declared to never be null
6.STATEMENT: COMMENT ON CONSTRAINT the_constraint ON DOMAIN
no_comments_dom IS 'another bad comment';
snapmgr.c:590:31: runtime error: null pointer passed as argument 2, which
is declared to never be null
7.STATEMENT: create trigger my_table_col_update_trig
after update of b on my_table referencing new table as new_table
for each statement execute procedure dump_insert();
clog.c:299:10: runtime error: null pointer passed as argument 1, which is
declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
xact.c:5285:25: runtime error: null pointer passed as argument 2, which is
declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior xact.c:5285:25 in
snapmgr.c:590:31: runtime error: null pointer passed as argument 2, which
is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior snapmgr.c:590:31 in
snapmgr.c:594:34: runtime error: null pointer passed as argument 2, which
is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior snapmgr.c:594:34 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is
declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is
declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in 8.
clog.c:299:10: runtime error: null pointer passed as argument 1, which is
declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
clog.c:299:10: runtime error: null pointer passed as argument 1, which is
declared to never be null
/usr/include/string.h:65:33: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clog.c:299:10 in
8.STATEMENT: select array_fill(1, array[[1,2],[3,4]]);
copyfuncs.c:1190:2: runtime error: null pointer passed as argument 2, which
is declared to never be null

I stopped counting clog.c (299).
If anyone wants, the full report, it has 2mb.

Ranier Vilela

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ranier Vilela (#1)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

On 2020-Aug-27, Ranier Vilela wrote:

indexcmds.c (1162):
memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);

Looks legit, and at least per commit 13bba02271dc we do fix such things,
even if it's useless in practice.

Given that no buildfarm member has ever complained, this exercise seems
pretty pointless.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Ranier Vilela
ranier.vf@gmail.com
In reply to: Alvaro Herrera (#2)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Em qui., 27 de ago. de 2020 às 13:57, Alvaro Herrera <
alvherre@2ndquadrant.com> escreveu:

On 2020-Aug-27, Ranier Vilela wrote:

indexcmds.c (1162):
memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);

Looks legit, and at least per commit 13bba02271dc we do fix such things,
even if it's useless in practice.

Given that no buildfarm member has ever complained, this exercise seems
pretty pointless.

Hi Álvaro,
If we are passing a null pointer in these places and it should not be done,
it is a sign that perhaps these calls should not or should not be made, and
they can be avoided.
This would eliminate undefined behavior and save some cycles?

regards,
Ranier Vilela

#4Ranier Vilela
ranier.vf@gmail.com
In reply to: Alvaro Herrera (#2)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Em qui., 27 de ago. de 2020 às 13:57, Alvaro Herrera <
alvherre@2ndquadrant.com> escreveu:

On 2020-Aug-27, Ranier Vilela wrote:

indexcmds.c (1162):
memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);

Looks legit, and at least per commit 13bba02271dc we do fix such things,
even if it's useless in practice.

Given that no buildfarm member has ever complained, this exercise seems
pretty pointless.

See at:
https://postgrespro.com/list/thread-id/1870065
"NULL passed as an argument to memcmp() in parse_func.c
<https://postgrespro.com/list/id/BLU437-SMTP48A5B2099E7134AC6BE7C7F2A10@phx.gbl&gt;
"

regards,
Ranier Vilela

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ranier Vilela (#3)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

On 2020-Aug-27, Ranier Vilela wrote:

If we are passing a null pointer in these places and it should not be done,
it is a sign that perhaps these calls should not or should not be made, and
they can be avoided.

Feel free to send a patch.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Noah Misch
noah@leadboat.com
In reply to: Alvaro Herrera (#2)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote:

On 2020-Aug-27, Ranier Vilela wrote:

indexcmds.c (1162):
memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);

Looks legit, and at least per commit 13bba02271dc we do fix such things,
even if it's useless in practice.

Given that no buildfarm member has ever complained, this exercise seems
pretty pointless.

Later decision to stop changing such code:
/messages/by-id/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#6)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Noah Misch <noah@leadboat.com> writes:

On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote:

Looks legit, and at least per commit 13bba02271dc we do fix such things,
even if it's useless in practice.
Given that no buildfarm member has ever complained, this exercise seems
pretty pointless.

Later decision to stop changing such code:
/messages/by-id/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com

I agree that this seems academic for any sane implementation of memcmp
and friends. If the function is not allowed to fetch or store any bytes
when the length parameter is zero, which it certainly is not, then how
could it matter whether the pointer parameter is NULL? It would be
interesting to know the rationale behind the C standard's claim that
this case should be undefined.

Having said that, I think that the actual risk here has to do not with
what memcmp() might do, but with what gcc might do in code surrounding
the call, once it's armed with the assumption that any pointer we pass
to memcmp() could not be null. See

/messages/by-id/BLU437-SMTP48A5B2099E7134AC6BE7C7F2A10@phx.gbl

It's surely not hard to visualize cases where necessary code could
be optimized away if the compiler thinks it's entitled to assume
such things.

In other words, the C standard made a damfool decision and now we need
to deal with the consequences of that as perpetrated by other fools.
Still, it's all hypothetical so far --- does anyone have examples of
actual rather than theoretical issues?

regards, tom lane

#8Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#7)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

On Thu, Aug 27, 2020 at 11:11:47PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote:

Looks legit, and at least per commit 13bba02271dc we do fix such things,
even if it's useless in practice.
Given that no buildfarm member has ever complained, this exercise seems
pretty pointless.

Later decision to stop changing such code:
/messages/by-id/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com

I think that the actual risk here has to do not with
what memcmp() might do, but with what gcc might do in code surrounding
the call, once it's armed with the assumption that any pointer we pass
to memcmp() could not be null. See

/messages/by-id/BLU437-SMTP48A5B2099E7134AC6BE7C7F2A10@phx.gbl

It's surely not hard to visualize cases where necessary code could
be optimized away if the compiler thinks it's entitled to assume
such things.

Good point. We could pick from a few levels of concern:

- No concern: reject changes serving only to remove this class of deviation.
This is today's policy.
- Medium concern: accept fixes, but the buildfarm continues not to break in
the face of new deviations. This will make some code uglier, but we'll be
ready against some compiler growing the optimization you describe.
- High concern: I remove -fno-sanitize=nonnull-attribute from buildfarm member
thorntail. In addition to the drawback of the previous level, this will
create urgent work for committers introducing new deviations (or introducing
test coverage that unearths old deviations). This is our current response
to Valgrind complaints, for example.

#9Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#7)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Em sex., 28 de ago. de 2020 às 00:11, Tom Lane <tgl@sss.pgh.pa.us> escreveu:

In other words, the C standard made a damfool decision and now we need
to deal with the consequences of that as perpetrated by other fools.
Still, it's all hypothetical so far --- does anyone have examples of
actual rather than theoretical issues?

I still think the value of this alert would be to avoid the call.
Why do memcmp have to deal with a NULL value?
clog.c: 299, it is a case outside the curve, there are hundreds of calls in
the report.
It must be very difficult to correct, but if TransactionIdSetPageStatus was
not called in these cases, memcmp, it would not have to deal with the NULL
pointer.

regards,
Ranier Vilela

In reply to: Noah Misch (#8)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

On Thu, Aug 27, 2020 at 11:04 PM Noah Misch <noah@leadboat.com> wrote:

On Thu, Aug 27, 2020 at 11:11:47PM -0400, Tom Lane wrote:

It's surely not hard to visualize cases where necessary code could
be optimized away if the compiler thinks it's entitled to assume
such things.

Good point.

I wonder if we should start using -fno-delete-null-pointer-checks:

https://lkml.org/lkml/2018/4/4/601

This may not be strictly relevant to the discussion, but I was
reminded of it just now and thought I'd mention it.

--
Peter Geoghegan

#11Ranier Vilela
ranier.vf@gmail.com
In reply to: Noah Misch (#8)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Em sex., 28 de ago. de 2020 às 03:04, Noah Misch <noah@leadboat.com>
escreveu:

On Thu, Aug 27, 2020 at 11:11:47PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Thu, Aug 27, 2020 at 12:57:20PM -0400, Alvaro Herrera wrote:

Looks legit, and at least per commit 13bba02271dc we do fix such

things,

even if it's useless in practice.
Given that no buildfarm member has ever complained, this exercise

seems

pretty pointless.

Later decision to stop changing such code:

/messages/by-id/flat/e1a26ece-7057-a234-d87e-4ce1cdc9eaa0@2ndquadrant.com

I think that the actual risk here has to do not with
what memcmp() might do, but with what gcc might do in code surrounding
the call, once it's armed with the assumption that any pointer we pass
to memcmp() could not be null. See

/messages/by-id/BLU437-SMTP48A5B2099E7134AC6BE7C7F2A10@phx.gbl

It's surely not hard to visualize cases where necessary code could
be optimized away if the compiler thinks it's entitled to assume
such things.

Good point. We could pick from a few levels of concern:

- No concern: reject changes serving only to remove this class of
deviation.
This is today's policy.
- Medium concern: accept fixes, but the buildfarm continues not to break in
the face of new deviations. This will make some code uglier, but we'll
be
ready against some compiler growing the optimization you describe.
- High concern: I remove -fno-sanitize=nonnull-attribute from buildfarm
member
thorntail. In addition to the drawback of the previous level, this will
create urgent work for committers introducing new deviations (or
introducing
test coverage that unearths old deviations). This is our current
response
to Valgrind complaints, for example.

Maybe in this specific case, the policy could be ignored, this change does
not hurt.

--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -293,7 +293,7 @@ TransactionIdSetPageStatus(TransactionId xid, int
nsubxids,
  * sub-XIDs and all of the XIDs for which we're adjusting clog should be
  * on the same page.  Check those conditions, too.
  */
- if (all_xact_same_page && xid == MyProc->xid &&
+ if (all_xact_same_page && subxids && xid == MyProc->xid &&
  nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT &&
  nsubxids == MyProc->subxidStatus.count &&
  memcmp(subxids, MyProc->subxids.xids,

regards,
Ranier Vilela

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#10)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Peter Geoghegan <pg@bowt.ie> writes:

I wonder if we should start using -fno-delete-null-pointer-checks:
https://lkml.org/lkml/2018/4/4/601
This may not be strictly relevant to the discussion, but I was
reminded of it just now and thought I'd mention it.

Hmm. gcc 8.3 defines this as:

Assume that programs cannot safely dereference null pointers, and
that no code or data element resides at address zero. This option
enables simple constant folding optimizations at all optimization
levels. In addition, other optimization passes in GCC use this
flag to control global dataflow analyses that eliminate useless
checks for null pointers; these assume that a memory access to
address zero always results in a trap, so that if a pointer is
checked after it has already been dereferenced, it cannot be null.

AFAICS, that's a perfectly valid assumption for our usage. I can see why
the kernel might not want it, but we set things up whenever possible to
ensure that dereferencing NULL would crash.

However, while grepping the manual for that I also found

'-Wnull-dereference'
Warn if the compiler detects paths that trigger erroneous or
undefined behavior due to dereferencing a null pointer. This
option is only active when '-fdelete-null-pointer-checks' is
active, which is enabled by optimizations in most targets. The
precision of the warnings depends on the optimization options used.

I wonder whether turning that on would find anything interesting.

regards, tom lane

#13Ranier Vilela
ranier.vf@gmail.com
In reply to: Tom Lane (#12)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

More troubles with undefined-behavior.

This type of code can leaves overflow:
var = (cast) (expression);
diff = (int32) (id1 - id2);

See:
diff64 = ((long int) d1 - (long int) d2);
diff64=-4294901760

#include <stdio.h>
#include <stdint.h>

int main()
{
unsigned int d1 = 3;
unsigned int d2 = 4294901763;
unsigned int diffu32 = 0;
unsigned long int diffu64 = 0;
unsigned long int diff64 = 0;
int32_t diff = 0;

    diff = (int32_t) (d1 - d2);
    diff64 =  ((long int) d1 - (long int) d2);
    diffu32 = (unsigned int) (d1 - d2);
    diffu64 = (unsigned long int) (d1 - d2);
printf("d1=%u\n", d1);
printf("d2=%u\n", d2);
printf("diff=%d\n", diff);
printf("diffu32=%u\n", diffu32);
printf("diff64=%ld\n", diff64);
printf("diffu64=%lu\n", diffu64);

return 0;
}

output:
d1=3
d2=4294901763
diff=65536
diffu32=65536
diff64=-4294901760
diffu64=65536

(With Ubuntu 64 bits + clang 10)
transam.c:311:22: runtime error: unsigned integer overflow: 3 - 4294901763
cannot be represented in type 'unsigned int'
TransactionIdPrecedes(TransactionId id1, TransactionId id2)
{
/*
* If either ID is a permanent XID then we can just do unsigned
* comparison. If both are normal, do a modulo-2^32 comparison.
*/
int32 diff;

if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
return (id1 < id2);

diff = (int32) (id1 - id2);
return (diff < 0);
}

This works, all time or really with bad numbers can break?
I would like to know, why doesn't it work?

With Windows 10 (64 bits) + msvc 2019 (64 bits)
bool
TransactionIdPrecedes(TransactionId id1, TransactionId id2)
{
/*
* If either ID is a permanent XID then we can just do unsigned
* comparison. If both are normal, do a modulo-2^32 comparison.
*/
int32 diff;
int64 diff64;

if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
return (id1 < id2);

diff = (int32) (id1 - id2);
diff64 = ((int64) id1 - (int64) id2);
    fprintf(stderr, "id1=%lu\n", id1);
    fprintf(stderr, "id2=%lu\n", id1);
    fprintf(stderr, "diff32=%ld\n", diff);
    fprintf(stderr, "diff64=%lld\n", diff64);
return (diff64 < 0);
}

id1=498
id2=498
diff32=200000000
diff64=-4094967296
2020-08-31 12:46:30.422 -03 [8908] WARNING: oldest xmin is far in the past
2020-08-31 12:46:30.422 -03 [8908] HINT: Close open transactions soon to
avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.

id1=4
id2=4
diff32=-494
diff64=-494

id1=4
id2=4
diff32=50000000
diff64=-4244967296
2020-08-31 12:46:30.423 -03 [8908] FATAL: found xmin 4 from before
relfrozenxid 4244967300
2020-08-31 12:46:30.423 -03 [8908] CONTEXT: while scanning block 0 and
offset 1 of relation "pg_catalog.pg_depend"
2020-08-31 12:46:30.423 -03 [8908] STATEMENT: VACUUM FREEZE;

Most of the time:
id1=498
id2=498
diff32=0
diff64=0
id1=498
id2=498
diff32=0
diff64=0

regards,
Ranier Vilela

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ranier Vilela (#13)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

On 2020-Aug-31, Ranier Vilela wrote:

More troubles with undefined-behavior.

This type of code can leaves overflow:
var = (cast) (expression);
diff = (int32) (id1 - id2);

See:
diff64 = ((long int) d1 - (long int) d2);
diff64=-4294901760

Did you compile this with gcc -fwrapv?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15Ranier Vilela
ranier.vf@gmail.com
In reply to: Alvaro Herrera (#14)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Em seg., 31 de ago. de 2020 às 14:00, Alvaro Herrera <
alvherre@2ndquadrant.com> escreveu:

On 2020-Aug-31, Ranier Vilela wrote:

More troubles with undefined-behavior.

This type of code can leaves overflow:
var = (cast) (expression);
diff = (int32) (id1 - id2);

See:
diff64 = ((long int) d1 - (long int) d2);
diff64=-4294901760

Did you compile this with gcc -fwrapv?

gcc 10.2 -O2 -fwrapv
bool test1()
{
unsigned int d1 = 3;
unsigned int d2 = 4294901763;
long int diff64 = 0;

diff64 = ((long int) d1 - (long int) d2);

return (diff64 < 0);
}

output:
mov eax, 1
ret

What is a workaround for msvc 2019 (64 bits) and clang 64 bits (linux)?
transam.c:311:22: runtime error: unsigned integer overflow: 3 - 4294901763
cannot be represented in type 'unsigned int'

Ranier Vilela

#16Ranier Vilela
ranier.vf@gmail.com
In reply to: Ranier Vilela (#15)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Em seg., 31 de ago. de 2020 às 14:43, Ranier Vilela <ranier.vf@gmail.com>
escreveu:

Em seg., 31 de ago. de 2020 às 14:00, Alvaro Herrera <
alvherre@2ndquadrant.com> escreveu:

On 2020-Aug-31, Ranier Vilela wrote:

More troubles with undefined-behavior.

This type of code can leaves overflow:
var = (cast) (expression);
diff = (int32) (id1 - id2);

See:
diff64 = ((long int) d1 - (long int) d2);
diff64=-4294901760

Did you compile this with gcc -fwrapv?

gcc 10.2 -O2 -fwrapv
bool test1()
{
unsigned int d1 = 3;
unsigned int d2 = 4294901763;
long int diff64 = 0;

diff64 = ((long int) d1 - (long int) d2);

return (diff64 < 0);
}

output:
mov eax, 1
ret

What is a workaround for msvc 2019 (64 bits) and clang 64 bits (linux)?
transam.c:311:22: runtime error: unsigned integer overflow: 3 - 4294901763
cannot be represented in type 'unsigned int'

with Debug:
#include <stdio.h>
#include <stdint.h>

bool test1(void)
{
unsigned int d1 = 3;
unsigned int d2 = 4294901763;
int32_t diff;

    diff = (int32_t) (d1 - d2);

return (diff < 0);
}

gcc 10.2 -g
output:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 3
mov DWORD PTR [rbp-8], -65533
mov eax, DWORD PTR [rbp-4]
sub eax, DWORD PTR [rbp-8]
mov DWORD PTR [rbp-12], eax
mov eax, DWORD PTR [rbp-12]
shr eax, 31
pop rbp
ret

it is possible to conclude that:
1. TransactionIdPrecedes works in release mode, because the compiler treats
undefined-behavior and corrects it,
treating a possible overflow.
2. TransactionIdPrecedes does not work in debug mode, and overflow occurs.
3. TransactionID cannot contain the largest possible ID or an invalid ID
(4294901763) has been generated and passed to TransactionIdPrecedes.

Ranier Vilela

#17Andres Freund
andres@anarazel.de
In reply to: Ranier Vilela (#16)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Hi,

On August 31, 2020 11:08:49 AM PDT, Ranier Vilela <ranier.vf@gmail.com> wrote:

Em seg., 31 de ago. de 2020 às 14:43, Ranier Vilela
<ranier.vf@gmail.com>
escreveu:

Em seg., 31 de ago. de 2020 às 14:00, Alvaro Herrera <
alvherre@2ndquadrant.com> escreveu:

On 2020-Aug-31, Ranier Vilela wrote:

More troubles with undefined-behavior.

This type of code can leaves overflow:
var = (cast) (expression);
diff = (int32) (id1 - id2);

See:
diff64 = ((long int) d1 - (long int) d2);
diff64=-4294901760

Did you compile this with gcc -fwrapv?

gcc 10.2 -O2 -fwrapv
bool test1()
{
unsigned int d1 = 3;
unsigned int d2 = 4294901763;
long int diff64 = 0;

diff64 = ((long int) d1 - (long int) d2);

return (diff64 < 0);
}

output:
mov eax, 1
ret

What is a workaround for msvc 2019 (64 bits) and clang 64 bits

(linux)?

transam.c:311:22: runtime error: unsigned integer overflow: 3 -

4294901763

cannot be represented in type 'unsigned int'

Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn about.

Andres

Regards,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In reply to: Andres Freund (#17)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:

Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn about.

Presumably it's simply warning that the value -4294901760 (i.e. the
result of 3 - 4294901763) cannot be faithfully represented as an
unsigned int. This is true, of course. It's just not relevant.

I'm pretty sure that UBSan does not actually state that this is
undefined behavior. At least Ranier's sample output didn't seem to
indicate it.

--
Peter Geoghegan

#19Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#18)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Hi,

On 2020-08-31 12:38:51 -0700, Peter Geoghegan wrote:

On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:

Unsigned integer overflow is well defined in the standard. So I don't understand what this is purporting to warn about.

Presumably it's simply warning that the value -4294901760 (i.e. the
result of 3 - 4294901763) cannot be faithfully represented as an
unsigned int. This is true, of course. It's just not relevant.

I'm pretty sure that UBSan does not actually state that this is
undefined behavior. At least Ranier's sample output didn't seem to
indicate it.

Well, my point is that there's no point in discussing unsigned integer
overflow, since it's precisely specified. And hence I don't understand
what we're discussing in this sub-thread.

https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html says:

-fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where
the result of an unsigned integer computation cannot be represented in
its type. Unlike signed integer overflow, this is not undefined
behavior, but it is often unintentional. This sanitizer does not check
for lossy implicit conversions performed before such a computation
(see -fsanitize=implicit-conversion).

So it seems Rainier needs to turn this test off, because it actually is
intentional.

Greetings,

Andres Freund

#20Ranier Vilela
ranier.vf@gmail.com
In reply to: Peter Geoghegan (#18)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Em seg., 31 de ago. de 2020 às 16:39, Peter Geoghegan <pg@bowt.ie> escreveu:

On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:

Unsigned integer overflow is well defined in the standard. So I don't

understand what this is purporting to warn about.

Presumably it's simply warning that the value -4294901760 (i.e. the
result of 3 - 4294901763) cannot be faithfully represented as an
unsigned int. This is true, of course. It's just not relevant.

I'm pretty sure that UBSan does not actually state that this is
undefined behavior. At least Ranier's sample output didn't seem to
indicate it.

4294901763 can not store at unsigned int (TransactionID is uint32_t).
TransactionId id2 at TransactionIdPrecedes already has an overflow, before
anything is done.

Ranier Vilela

#21Ranier Vilela
ranier.vf@gmail.com
In reply to: Andres Freund (#19)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Em seg., 31 de ago. de 2020 às 17:05, Andres Freund <andres@anarazel.de>
escreveu:

Hi,

On 2020-08-31 12:38:51 -0700, Peter Geoghegan wrote:

On Mon, Aug 31, 2020 at 11:42 AM Andres Freund <andres@anarazel.de>

wrote:

Unsigned integer overflow is well defined in the standard. So I don't

understand what this is purporting to warn about.

Presumably it's simply warning that the value -4294901760 (i.e. the
result of 3 - 4294901763) cannot be faithfully represented as an
unsigned int. This is true, of course. It's just not relevant.

I'm pretty sure that UBSan does not actually state that this is
undefined behavior. At least Ranier's sample output didn't seem to
indicate it.

Well, my point is that there's no point in discussing unsigned integer
overflow, since it's precisely specified. And hence I don't understand
what we're discussing in this sub-thread.

https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html says:

-fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where
the result of an unsigned integer computation cannot be represented in
its type. Unlike signed integer overflow, this is not undefined
behavior, but it is often unintentional. This sanitizer does not check
for lossy implicit conversions performed before such a computation
(see -fsanitize=implicit-conversion).

So it seems Rainier needs to turn this test off, because it actually is
intentional.

No problem.
If intentional, the code at TransactionIdPrecedes, already knows that
overflow can occur
and trusts that the compiler will save it.

Ranier Vilela

#22Andres Freund
andres@anarazel.de
In reply to: Ranier Vilela (#21)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Hi,

On 2020-08-31 17:35:14 -0300, Ranier Vilela wrote:

Em seg., 31 de ago. de 2020 �s 17:05, Andres Freund <andres@anarazel.de>
escreveu:

So it seems Rainier needs to turn this test off, because it actually is
intentional.

No problem.
If intentional, the code at TransactionIdPrecedes, already knows that
overflow can occur
and trusts that the compiler will save it.

I don't know what you mean with "saving" it. Again, unsigned integer
overflow is well specified in C. All that's needed is for the compiler
to implement normal C.

#23Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#12)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

On Sat, Aug 29, 2020 at 12:36:42PM -0400, Tom Lane wrote:

Peter Geoghegan <pg@bowt.ie> writes:

I wonder if we should start using -fno-delete-null-pointer-checks:
https://lkml.org/lkml/2018/4/4/601
This may not be strictly relevant to the discussion, but I was
reminded of it just now and thought I'd mention it.

Hmm. gcc 8.3 defines this as:

Assume that programs cannot safely dereference null pointers, and
that no code or data element resides at address zero. This option
enables simple constant folding optimizations at all optimization
levels. In addition, other optimization passes in GCC use this
flag to control global dataflow analyses that eliminate useless
checks for null pointers; these assume that a memory access to
address zero always results in a trap, so that if a pointer is
checked after it has already been dereferenced, it cannot be null.

AFAICS, that's a perfectly valid assumption for our usage. I can see why
the kernel might not want it, but we set things up whenever possible to
ensure that dereferencing NULL would crash.

We do assume dereferencing NULL would crash, but we also assume this
optimization doesn't happen:

=== opt-null.c
#include <string.h>
#include <unistd.h>

int my_memcpy(void *dest, const void *src, size_t n)
{
#ifndef REMOVE_MEMCPY
memcpy(dest, src, n);
#endif
if (src)
pause();
return 0;
}
===

$ gcc --version
gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ diff -sU1 <(gcc -O2 -fno-delete-null-pointer-checks -S -o- opt-null.c) <(gcc -O2 -S -o- opt-null.c)
--- /dev/fd/63  2020-09-03 19:23:53.206864378 -0700
+++ /dev/fd/62  2020-09-03 19:23:53.206864378 -0700
@@ -8,13 +8,8 @@
        .cfi_startproc
-       pushq   %rbx
+       subq    $8, %rsp
        .cfi_def_cfa_offset 16
-       .cfi_offset 3, -16
-       movq    %rsi, %rbx
        call    memcpy@PLT
-       testq   %rbx, %rbx
-       je      .L2
        call    pause@PLT
-.L2:
        xorl    %eax, %eax
-       popq    %rbx
+       addq    $8, %rsp
        .cfi_def_cfa_offset 8
$ diff -sU1 <(gcc -DREMOVE_MEMCPY -O2 -fno-delete-null-pointer-checks -S -o- opt-null.c) <(gcc -DREMOVE_MEMCPY -O2 -S -o- opt-null.c)
Files /dev/fd/63 and /dev/fd/62 are identical

So yes, it would be reasonable to adopt -fno-delete-null-pointer-checks and/or
remove -fno-sanitize=nonnull-attribute from buildfarm member thorntail.

However, while grepping the manual for that I also found

'-Wnull-dereference'
Warn if the compiler detects paths that trigger erroneous or
undefined behavior due to dereferencing a null pointer. This
option is only active when '-fdelete-null-pointer-checks' is
active, which is enabled by optimizations in most targets. The
precision of the warnings depends on the optimization options used.

I wonder whether turning that on would find anything interesting.

Promising. Sadly, it doesn't warn for the above test case.

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#23)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Noah Misch <noah@leadboat.com> writes:

We do assume dereferencing NULL would crash, but we also assume this
optimization doesn't happen:

#ifndef REMOVE_MEMCPY
memcpy(dest, src, n);
#endif
if (src)
pause();

[ gcc believes the if-test is unnecessary ]

Hm. I would not blame that on -fdelete-null-pointer-checks per se.
Rather the problem is what we were touching on before: the dubious
but standard-approved assumption that memcpy's arguments cannot be
null.

If there actually are places where this is a problem, I think we
need to fix it by doing

if (n > 0)
memcpy(dest, src, n);

so that the compiler can no longer assume that src!=NULL even
when n is zero. I'd still leave -fdelete-null-pointer-checks
enabled, because it can make valid and useful optimizations in
other cases. (Besides that, it's far from clear that disabling
that flag would suppress all bad consequences of the assumption
that memcpy's arguments aren't null.)

regards, tom lane

In reply to: Tom Lane (#24)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

On Thu, Sep 3, 2020 at 7:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm. I would not blame that on -fdelete-null-pointer-checks per se.
Rather the problem is what we were touching on before: the dubious
but standard-approved assumption that memcpy's arguments cannot be
null.

Isn't it both, together? That is, it's the combination of that
assumption alongside -fdelete-null-pointer-checks's actual willingness
to propagate the assumption.

I'd still leave -fdelete-null-pointer-checks
enabled, because it can make valid and useful optimizations in
other cases.

Is there any evidence that that's true? I wouldn't assume that the gcc
people exercised good judgement here.

--
Peter Geoghegan

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#25)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

Peter Geoghegan <pg@bowt.ie> writes:

On Thu, Sep 3, 2020 at 7:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'd still leave -fdelete-null-pointer-checks
enabled, because it can make valid and useful optimizations in
other cases.

Is there any evidence that that's true? I wouldn't assume that the gcc
people exercised good judgement here.

I have not actually dug for examples, but the sort of situation where
I think it would help us is that macros or static inlines could contain
null tests that can be proven useless at particular call sites due to
surrounding code.

regards, tom lane

#27Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#24)
Re: Clang UndefinedBehaviorSanitize (Postgres14) Detected undefined-behavior

On Thu, Sep 03, 2020 at 10:53:37PM -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

We do assume dereferencing NULL would crash, but we also assume this
optimization doesn't happen:

#ifndef REMOVE_MEMCPY
memcpy(dest, src, n);
#endif
if (src)
pause();

[ gcc believes the if-test is unnecessary ]

So yes, it would be reasonable to adopt -fno-delete-null-pointer-checks and/or
remove -fno-sanitize=nonnull-attribute from buildfarm member thorntail.

If there actually are places where this is a problem, I think we
need to fix it by doing

if (n > 0)
memcpy(dest, src, n);

so that the compiler can no longer assume that src!=NULL even
when n is zero. I'd still leave -fdelete-null-pointer-checks
enabled, because it can make valid and useful optimizations in
other cases. (Besides that, it's far from clear that disabling
that flag would suppress all bad consequences of the assumption
that memcpy's arguments aren't null.)

Your proposal is what I had in mind when I wrote "remove
-fno-sanitize=nonnull-attribute from buildfarm member thorntail", and I agree
it's attractive. In particular, gcc is not likely to be the last compiler to
attempt such an optimization, and other compilers may not offer
-fno-delete-null-pointer-checks or equivalent.

One might argue for -fno-delete-null-pointer-checks in addition, because it
would defend against cases that sanitizers miss. I tend to think that's
overkill, but maybe not. I suppose one could do an audit, diffing the
generated code with and without the option.