Broken atomics code on PPC with FreeBSD 10.3
In pursuit of knowledge about clock_gettime(), I installed FreeBSD 10.3
on a PowerBook I had laying about. I was disappointed to find out that
HEAD fails regression tests on that platform:
*** /usr/home/tgl/pgsql/src/test/regress/expected/lock.out Thu Dec 29 19:37:50 2016
--- /usr/home/tgl/pgsql/src/test/regress/results/lock.out Fri Dec 30 00:31:01 2016
***************
*** 63,70 ****
-- atomic ops tests
RESET search_path;
SELECT test_atomic_ops();
! test_atomic_ops
! -----------------
! t
! (1 row)
!
--- 63,66 ----
-- atomic ops tests
RESET search_path;
SELECT test_atomic_ops();
! ERROR: flag: set spuriously #2
======================================================================
After some digging I found out that the atomics code appears to believe
that the PPC platform has byte-wide atomics, which of course is nonsense.
That causes it to define pg_atomic_flag with the "char" variant, and
what we end up with is word-wide operations (lwarx and friends) operating
on a byte-wide struct. Failure is not exactly astonishing; what's
surprising is that we don't get stack-clobber core dumps or such.
Undef'ing HAVE_GCC__SYNC_CHAR_TAS makes it work.
Perhaps it could be argued that there's a FreeBSD compiler bug here,
but what I'm wondering is why configure believes that this:
[char lock = 0;
__sync_lock_test_and_set(&lock, 1);
__sync_lock_release(&lock);])],
is going to draw a hard error on platforms without byte-wide atomics.
My sense of C semantics is that the best you could hope for is a warning
--- and a look at the config.log on this platform says we're not
even getting that.
In short, I'd put the blame here on a ridiculously optimistic configure
check.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2016-12-30 00:44:33 -0500, Tom Lane wrote:
*** /usr/home/tgl/pgsql/src/test/regress/expected/lock.out Thu Dec 29 19:37:50 2016 --- /usr/home/tgl/pgsql/src/test/regress/results/lock.out Fri Dec 30 00:31:01 2016 *************** *** 63,70 **** -- atomic ops tests RESET search_path; SELECT test_atomic_ops(); ! test_atomic_ops ! ----------------- ! t ! (1 row) ! --- 63,66 ---- -- atomic ops tests RESET search_path; SELECT test_atomic_ops(); ! ERROR: flag: set spuriously #2======================================================================
Hm, not good.
After some digging I found out that the atomics code appears to believe
that the PPC platform has byte-wide atomics, which of course is nonsense.
That causes it to define pg_atomic_flag with the "char" variant, and
what we end up with is word-wide operations (lwarx and friends) operating
on a byte-wide struct. Failure is not exactly astonishing; what's
surprising is that we don't get stack-clobber core dumps or such.
Undef'ing HAVE_GCC__SYNC_CHAR_TAS makes it work.Perhaps it could be argued that there's a FreeBSD compiler bug here,
but what I'm wondering is why configure believes that this:[char lock = 0;
__sync_lock_test_and_set(&lock, 1);
__sync_lock_release(&lock);])],is going to draw a hard error on platforms without byte-wide atomics.
My sense of C semantics is that the best you could hope for is a
warning
Well, in theory these aren't actual functions but intrinsics with
special behaviour ("by being overloaded so that they work on multiple
types."). The compiler is supposed to warn and link to an external
function if a certain operation is not available:
Not all operations are supported by all target processors. If a
particular operation cannot be implemented on the target processor, a
warning is generated and a call to an external function is
generated. The external function carries the same name as the built-in
version, with an additional suffix ‘_n’ where n is the size of the data
type.
So that assumption made in that configure test doesn't seem that
unreasonable. What assembler do byte-wide atomics generate on that
platform? Which compiler/version is this?
Regards,
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On 2016-12-30 00:44:33 -0500, Tom Lane wrote:
Perhaps it could be argued that there's a FreeBSD compiler bug here,
but what I'm wondering is why configure believes that this:[char lock = 0;
__sync_lock_test_and_set(&lock, 1);
__sync_lock_release(&lock);])],is going to draw a hard error on platforms without byte-wide atomics.
My sense of C semantics is that the best you could hope for is a
warning
Well, in theory these aren't actual functions but intrinsics with
special behaviour ("by being overloaded so that they work on multiple
types."). The compiler is supposed to warn and link to an external
function if a certain operation is not available:
Yeah, I read that. But configure tests don't normally notice warnings.
Therefore, even if the compiler is behaving fully per spec (and I think
FreeBSD's might not be), we are going to think that char and word-wide
operations are interchangeable even when one is an efficient, in-line
operation and the other is inefficiently emulated by a function or
kernel trap. This doesn't really seem good enough, especially for
standard architectures where we know darn well what ought to happen.
There's a reason why we've not converted s_lock.h to rely on compiler
intrinsics everywhere, and this is exactly it: the quality of the
intrinsics implementations are horridly variable.
So that assumption made in that configure test doesn't seem that
unreasonable. What assembler do byte-wide atomics generate on that
platform? Which compiler/version is this?
$ gcc -v
Using built-in specs.
Target: powerpc-undermydesk-freebsd
Configured with: FreeBSD/powerpc system compiler
Thread model: posix
gcc version 4.2.1 20070831 patched [FreeBSD]
I tried "gcc -Wall -O0 -g -S bogus.c" on this input:
int
test_tas_char(volatile char *ptr)
{
return __sync_lock_test_and_set(ptr, 1) == 0;
}
int
test_tas_int(volatile int *ptr)
{
return __sync_lock_test_and_set(ptr, 1) == 0;
}
and got no warnings and the attached output. I'm not very good at reading
PPC assembler, but I think what is happening in the "char" case is that
gcc is trying to emulate a byte-wide operation using a word-wide one,
ie an lwarx/stwcx. loop. Even if that works (and apparently it does not),
we would not want to use it since it implies contention between adjacent
atomic flags.
BTW, I looked around to see why the PPC critters in the buildfarm aren't
failing, and the answer is that they all have compilers that are too old
to have __sync_lock_test_and_set at all, so that the configure probes
just fail outright. But the probes can't tell the difference between
implementations we want to use and implementations we don't.
regards, tom lane
Attachments:
bogus.stext/x-asm; charset=us-ascii; name=bogus.sDownload
On December 30, 2016 4:48:22 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
On 2016-12-30 00:44:33 -0500, Tom Lane wrote:
Perhaps it could be argued that there's a FreeBSD compiler bug here,
but what I'm wondering is why configure believes that this:[char lock = 0;
__sync_lock_test_and_set(&lock, 1);
__sync_lock_release(&lock);])],is going to draw a hard error on platforms without byte-wide
atomics.
My sense of C semantics is that the best you could hope for is a
warningWell, in theory these aren't actual functions but intrinsics with
special behaviour ("by being overloaded so that they work on multiple
types."). The compiler is supposed to warn and link to an external
function if a certain operation is not available:Yeah, I read that. But configure tests don't normally notice warnings.
IIRC those functions are *not* supposed to be provided, I.e. they should result in a linker error.
BTW, it's unclear why there's no 1 byte atomics support in that compiler, isn't it? On a ll/sc arch like ppc it should (and IIRC is) be supported efficiently. As you suggest, using a lwarx should just work.
I'm not very good at reading
PPC assembler, but I think what is happening in the "char" case is that
gcc is trying to emulate a byte-wide operation using a word-wide one,
ie an lwarx/stwcx. loop. Even if that works (and apparently it does not),
we would not want to use it since it implies contention between adjacent
atomic flags.
That's the case as long as they're on a single cache line. Whether it's the same word, double/half word shouldn't matter.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On December 30, 2016 4:48:22 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
and got no warnings and the attached output. I'm not very good at
reading
PPC assembler, but I think what is happening in the "char" case is that
gcc is trying to emulate a byte-wide operation using a word-wide one,
ie an lwarx/stwcx. loop.
Hm. This seems to suggest a straight out code generation bug in that compiler, not a failure in intrinsic detection.
I'll note that there's certainly ppc64 machine with that intrinsic working (tested that on the community hydra during atomics development). So either it's a bug specific to some compiler version, or 32bit ppc.
I assume there's no trivial way to get a newer compiler on that machine?
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On December 30, 2016 4:48:22 PM GMT+01:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
and got no warnings and the attached output. I'm not very good at
reading PPC assembler, but I think what is happening in the "char" case is that
gcc is trying to emulate a byte-wide operation using a word-wide one,
ie an lwarx/stwcx. loop.
Hm. This seems to suggest a straight out code generation bug in that compiler, not a failure in intrinsic detection.
It does smell like a codegen bug; I'm planning to file a FreeBSD bug
report once I have more data (like whether 11.0 is busted similarly).
But that doesn't really detract from my point, which is that it's
totally silly for us to imagine that char and word-wide atomic ops are
interchangeable on all platforms and we can flip a coin to decide which
to use as long as the configure probes don't fail outright. Even given
working code for the byte case, we ought to prefer int atomics on PPC.
On other platforms, maybe the preference goes the other way. I'd be
inclined to follow the hard-won knowledge embedded in s_lock.h about
which to prefer.
Or in words of one syllable: this isn't the first bug we've seen in
compiler intrinsics, and it won't be the last. We need a way to deal
with that, not just claim it isn't our problem.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
But that doesn't really detract from my point, which is that it's
totally silly for us to imagine that char and word-wide atomic ops are
interchangeable on all platforms and we can flip a coin to decide which
to use as long as the configure probes don't fail outright. Even given
working code for the byte case, we ought to prefer int atomics on PPC.
On other platforms, maybe the preference goes the other way. I'd be
inclined to follow the hard-won knowledge embedded in s_lock.h about
which to prefer.
After further study, I'm inclined to just propose that we flip the default
width of pg_atomic_flag in generic-gcc.h: use int not char if both are
available. The only modern platform where that's the wrong thing is x86,
and that case is irrelevant here because we'll be using arch-x86.h not
generic-gcc.h.
A survey of s_lock.h shows that we prefer char-width slock_t only on
these architectures:
x86
sparc (but not sparcv9, there we use int)
m68k
vax
So basically, the existing coding is optimizing for obsolete hardware,
and not even very much of that.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 2, 2017 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
But that doesn't really detract from my point, which is that it's
totally silly for us to imagine that char and word-wide atomic ops are
interchangeable on all platforms and we can flip a coin to decide which
to use as long as the configure probes don't fail outright. Even given
working code for the byte case, we ought to prefer int atomics on PPC.
On other platforms, maybe the preference goes the other way. I'd be
inclined to follow the hard-won knowledge embedded in s_lock.h about
which to prefer.After further study, I'm inclined to just propose that we flip the default
width of pg_atomic_flag in generic-gcc.h: use int not char if both are
available. The only modern platform where that's the wrong thing is x86,
and that case is irrelevant here because we'll be using arch-x86.h not
generic-gcc.h.A survey of s_lock.h shows that we prefer char-width slock_t only on
these architectures:x86
sparc (but not sparcv9, there we use int)
m68k
vax
I don't think that's right, because on my MacBook Pro:
(gdb) p sizeof(slock_t)
$1 = 1
(gdb)
On Linux x86_64:
(gdb) p sizeof(slock_t)
$1 = 1
I think we would be well-advised to get the size of slock_t down to a
single byte on as many platforms as possible, because when it's any
wider than that it makes some critical structures that would otherwise
fit into a cache line start to not fit, and that can have a very big
impact on performance. I'm disappointed to notice that it's 4 bytes
wide on hydra (ppc64).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Jan 2, 2017 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
After further study, I'm inclined to just propose that we flip the default
width of pg_atomic_flag in generic-gcc.h: use int not char if both are
available. The only modern platform where that's the wrong thing is x86,
and that case is irrelevant here because we'll be using arch-x86.h not
generic-gcc.h.A survey of s_lock.h shows that we prefer char-width slock_t only on
these architectures:x86
sparc (but not sparcv9, there we use int)
m68k
vax
I don't think that's right, because on my MacBook Pro:
... which is an x86, which won't be affected by the proposed change.
I think we would be well-advised to get the size of slock_t down to a
single byte on as many platforms as possible, because when it's any
wider than that it makes some critical structures that would otherwise
fit into a cache line start to not fit, and that can have a very big
impact on performance.
I really doubt that that's a good argument for choosing a markedly less
efficient locking primitive, which is what's at stake for PPC. I have
no info about the other architectures.
Also, since pg_atomic_flag is currently used in a grand total of zero
places (other than the test case in regress.c), arguing that making
it word-wide will bloat critical data structures is flat wrong.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 3, 2017 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
A survey of s_lock.h shows that we prefer char-width slock_t only on
these architectures:x86
sparc (but not sparcv9, there we use int)
m68k
vaxI don't think that's right, because on my MacBook Pro:
... which is an x86, which won't be affected by the proposed change.
Uh, x86 means 32-bit to me, and this MacBook Pro is definitely x86_64.
I think we would be well-advised to get the size of slock_t down to a
single byte on as many platforms as possible, because when it's any
wider than that it makes some critical structures that would otherwise
fit into a cache line start to not fit, and that can have a very big
impact on performance.I really doubt that that's a good argument for choosing a markedly less
efficient locking primitive, which is what's at stake for PPC. I have
no info about the other architectures.
I don't know for certain, but I would not be willing to take that on
faith. I'm not sure if you've read all of the discussion threads on
this mailing list about fitting things into cache lines and/or
aligning things to cache lines to avoid major performance regressions
on large servers, but there have been quite a few of those over the
last few years and there will doubtless be more.
Also, since pg_atomic_flag is currently used in a grand total of zero
places (other than the test case in regress.c), arguing that making
it word-wide will bloat critical data structures is flat wrong.
Well that just begs the question of whether we should rip it out. If
it's unused, then, yes, the performance characteristics don't matter,
but if it's gonna get used for anything important, I maintain that
both the speed of the implementation and the number of bytes that it
consumes will be relevant.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Jan 3, 2017 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
A survey of s_lock.h shows that we prefer char-width slock_t only on
these architectures:x86
sparc (but not sparcv9, there we use int)
m68k
vax
I don't think that's right, because on my MacBook Pro:
... which is an x86, which won't be affected by the proposed change.
Uh, x86 means 32-bit to me, and this MacBook Pro is definitely x86_64.
Sorry for the confusion. x86 subsumes x86_64 so far as the atomics
code is concerned, and I was following that terminology.
Also, since pg_atomic_flag is currently used in a grand total of zero
places (other than the test case in regress.c), arguing that making
it word-wide will bloat critical data structures is flat wrong.
Well that just begs the question of whether we should rip it out. If
it's unused, then, yes, the performance characteristics don't matter,
but if it's gonna get used for anything important, I maintain that
both the speed of the implementation and the number of bytes that it
consumes will be relevant.
[ shrug... ] I have not seen you putting any effort into optimizing
slock_t on non-x86 architectures, where it might make a difference today.
Why all the concern about shaving hypothetical future bytes for
pg_atomic_flag?
But to reduce this to the simplest possible terms: it really does not
matter whether the implementation saves bytes or cycles or anything else,
if it fails to *work*. The existing coding for PPC fails on FreeBSD,
and likely on some other platforms using the same compiler. I'd be the
first to say that that doesn't reflect well on the state of their PPC
port, but it's reality that we ought to be cognizant of. And we've
worked around similar bugs nearby, eg see this bit in atomics.h:
* Given a gcc-compatible xlc compiler, prefer the xlc implementation. The
* ppc64le "IBM XL C/C++ for Linux, V13.1.2" implements both interfaces, but
* __sync_lock_test_and_set() of one-byte types elicits SIGSEGV.
From here it seems like you're arguing that we mustn't give FreeBSD
the identical pass that we already gave IBM, for what is nearly the
same bug.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 3, 2017 at 12:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
[ shrug... ] I have not seen you putting any effort into optimizing
slock_t on non-x86 architectures, where it might make a difference today.
Why all the concern about shaving hypothetical future bytes for
pg_atomic_flag?
I don't know what you're getting all wrapped around the axle here
about. I already explained why I think it's an issue. You can
disagree if you like. As to whether I've put any effort into
optimizing slock_t on non-x86 architectures, I commented in my first
post to this thread about my disappointment regarding the situation on
ppc64. I didn't realize that we had that issue and I think it would
be worth fixing at some point, but I haven't quite gotten around to it
in the 4 hours since I had that realization. I have previously done
work on non-x86 spinlocks, though
(c01c25fbe525869fa81237954727e1eb4b7d4a14).
But to reduce this to the simplest possible terms: it really does not
matter whether the implementation saves bytes or cycles or anything else,
if it fails to *work*. The existing coding for PPC fails on FreeBSD,
and likely on some other platforms using the same compiler. I'd be the
first to say that that doesn't reflect well on the state of their PPC
port, but it's reality that we ought to be cognizant of. And we've
worked around similar bugs nearby, eg see this bit in atomics.h:* Given a gcc-compatible xlc compiler, prefer the xlc implementation. The
* ppc64le "IBM XL C/C++ for Linux, V13.1.2" implements both interfaces, but
* __sync_lock_test_and_set() of one-byte types elicits SIGSEGV.From here it seems like you're arguing that we mustn't give FreeBSD
the identical pass that we already gave IBM, for what is nearly the
same bug.
The only point I'm making here is that the width of a spinlock is not
irrelevant. Or at least, it didn't use to be. lwlock.h says:
* LWLOCK_MINIMAL_SIZE should be 32 on basically all common platforms, but
* because slock_t is more than 2 bytes on some obscure platforms, we allow
* for the possibility that it might be 64.
That comment is actually nonsense since
008608b9d51061b1f598c197477b3dc7be9c4a64 but before that it was
relevant. Also, before 48354581a49c30f5757c203415aa8412d85b0f70, a
BufferDesc fit within a 64-byte cache line if slock_t was 1 or 2
bytes, a point commented on explicitly by
6150a1b08a9fe7ead2b25240be46dddeae9d98e1. So we've clearly made
efforts in that direction in the past. Looking a little more, though,
since both lwlock.c and bufmgr.c have been rewritten to use atomics
directly, there might not be any remaining places where the spinlocks
get hot enough to care about the extra bytes. All things being equal
I still think a narrower one is significantly better than a wider one,
but we can always leave solving that problem to a day when the
difference can be proved out by benchmarks.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
The only point I'm making here is that the width of a spinlock is not
irrelevant.
Sure, but it does not follow that we need to get all hot and bothered
about the width of pg_atomic_flag, which has yet to find its first
use-case. When and if its width becomes a demonstrable issue,
we'll have some work to do in this area ... but that was true already.
All things being equal
I still think a narrower one is significantly better than a wider one,
but we can always leave solving that problem to a day when the
difference can be proved out by benchmarks.
I think we can agree on that conclusion.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers