Overflow hazard in pgbench
moonjelly just reported an interesting failure [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=moonjelly&dt=2021-06-26%2007%3A03%3A17. It seems that
with the latest bleeding-edge gcc, this code is misoptimized:
/* check random range */
if (imin > imax)
{
pg_log_error("empty range given to random");
return false;
}
else if (imax - imin < 0 || (imax - imin) + 1 < 0)
{
/* prevent int overflows in random functions */
pg_log_error("random range is too large");
return false;
}
such that the second if-test doesn't fire. Now, according to the C99
spec this code is broken, because the compiler is allowed to assume
that signed integer overflow doesn't happen, whereupon the second
if-block is provably unreachable. The failure still represents a gcc
bug, because we're using -fwrapv which should disable that assumption.
However, not all compilers have that switch, so it'd be better to code
this in a spec-compliant way. I suggest applying the attached in
branches that have the required functions.
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=moonjelly&dt=2021-06-26%2007%3A03%3A17
regards, tom lane
Attachments:
avoid-pgbench-overflow-hazard-1.patchtext/x-diff; charset=us-ascii; name=avoid-pgbench-overflow-hazard-1.patchDownload+4-2
I wrote:
... according to the C99
spec this code is broken, because the compiler is allowed to assume
that signed integer overflow doesn't happen, whereupon the second
if-block is provably unreachable. The failure still represents a gcc
bug, because we're using -fwrapv which should disable that assumption.
However, not all compilers have that switch, so it'd be better to code
this in a spec-compliant way.
BTW, for grins I tried building today's HEAD without -fwrapv, using
gcc version 11.1.1 20210531 (Red Hat 11.1.1-3) (GCC)
which is the newest version I have at hand. Not very surprisingly,
that reproduced the failure shown on moonjelly. However, after adding
the patch I proposed, "make check-world" passed! I was not expecting
that result; I supposed we still had lots of lurking assumptions of
traditional C overflow handling.
I'm not in any hurry to remove -fwrapv, because (a) this result doesn't
show that we have no such assumptions, only that they must be lurking
in darker, poorly-tested corners, and (b) I'm not aware of any reason
to think that removing -fwrapv would provide benefits worth taking any
risks for. But we may be closer to being able to do without that
switch than I thought.
regards, tom lane
Hello Tom,
moonjelly just reported an interesting failure [1].
I noticed. I was planning to have a look at it, thanks for digging!
It seems that with the latest bleeding-edge gcc, this code is
misoptimized:
else if (imax - imin < 0 || (imax - imin) + 1 < 0)
{
/* prevent int overflows in random functions */
pg_log_error("random range is too large");
return false;
}such that the second if-test doesn't fire. Now, according to the C99
spec this code is broken, because the compiler is allowed to assume
that signed integer overflow doesn't happen,
Hmmm, so it is not possible to detect these with standard arithmetic as
done by this code. Note that the code was written in 2016, ISTM pre C99
Postgres. I'm unsure about what a C compiler could assume on the previous
standard wrt integer arithmetic.
whereupon the second if-block is provably unreachable.
Indeed.
The failure still represents a gcc bug, because we're using -fwrapv
which should disable that assumption.
Ok, I'll report it.
I also see a good point with pgbench tests exercising edge cases.
However, not all compilers have that switch, so it'd be better to code
this in a spec-compliant way.
Ok.
I suggest applying the attached in branches that have the required
functions.
The latest gcc, recompiled yesterday, is still wrong, as shown by
moonjelly current status.
The proposed patch does fix the issue in pgbench TAP test. I'd suggest to
add unlikely() on all these conditions, as already done elsewhere. See
attached version.
I confirm that check-world passed with gcc head and its broken -fwrapv.
I also recompiled after removing manually -fwrapv: Make check, pgbench TAP
tests and check-world all passed. I'm not sure that edge case are well
enough tested in postgres to be sure that all is ok just from these runs
though.
--
Fabien.
Attachments:
avoid-pgbench-overflow-hazard-2.patchtext/x-diff; name=avoid-pgbench-overflow-hazard-2.patchDownload+5-3
Fabien COELHO <coelho@cri.ensmp.fr> writes:
I suggest applying the attached in branches that have the required
functions.
The proposed patch does fix the issue in pgbench TAP test. I'd suggest to
add unlikely() on all these conditions, as already done elsewhere. See
attached version.
Done that way, though I'm skeptical that it makes any measurable
difference.
I also recompiled after removing manually -fwrapv: Make check, pgbench TAP
tests and check-world all passed. I'm not sure that edge case are well
enough tested in postgres to be sure that all is ok just from these runs
though.
Yeah, I'm afraid that in most places it'd take a specifically-designed
test case to expose a problem, if there is one.
regards, tom lane
The failure still represents a gcc bug, because we're using -fwrapv which
should disable that assumption.Ok, I'll report it.
Done at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101254
--
Fabien.
Hello Tom,
The failure still represents a gcc bug, because we're using -fwrapv which
should disable that assumption.Ok, I'll report it.
Fixed at r12-1916-ga96d8d67d0073a7031c0712bc3fb7759417b2125
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a96d8d67d0073a7031c0712bc3fb7759417b2125
Just under 10 hours from the bug report…
--
Fabien.
Hi,
On 2021-06-27 16:21:46 -0400, Tom Lane wrote:
BTW, for grins I tried building today's HEAD without -fwrapv, using
gcc version 11.1.1 20210531 (Red Hat 11.1.1-3) (GCC)
which is the newest version I have at hand. Not very surprisingly,
that reproduced the failure shown on moonjelly. However, after adding
the patch I proposed, "make check-world" passed! I was not expecting
that result; I supposed we still had lots of lurking assumptions of
traditional C overflow handling.
We did fix a lot of them a few years back...
I'm not in any hurry to remove -fwrapv, because (a) this result doesn't
show that we have no such assumptions, only that they must be lurking
in darker, poorly-tested corners, and (b) I'm not aware of any reason
to think that removing -fwrapv would provide benefits worth taking any
risks for. But we may be closer to being able to do without that
switch than I thought.
Lack of failures after removing frwapv itself doesn't prove that much -
very commonly the compiler won't optimize based on the improved
knowledge about value range. Additionally we probably don't exercise all
effected places in our tests.
ubsan is able to catch all signed overflows. The last time I played
around with that, tests still were hitting quite a few cases of
overflows. But most not in particularly interesting places
(e.g. cash_out, RIGHTMOST_ONE()) but also a few where it might be worth
being careful about it in case a compiler disregards -fwrapv or doesn't
implement it (e.g. _dorand48()).
It might be worth setting up a bf animal with ubsan and enabled overflow
checking...
Greetings,
Andres Freund