valgrind versus pg_atomic_init()

Started by Tom Laneover 5 years ago14 messages

tgl@sss.pgh.pa.us

over 5 years ago

I experimented with running "make check" on ARM64 under a reasonably
bleeding-edge valgrind (3.16.0). One thing I ran into is that
regress.c's test_atomic_ops fails; valgrind shows the stack trace

fun:__aarch64_cas8_acq_rel
fun:pg_atomic_compare_exchange_u64_impl
fun:pg_atomic_exchange_u64_impl
fun:pg_atomic_write_u64_impl
fun:pg_atomic_init_u64_impl
fun:pg_atomic_init_u64
fun:test_atomic_uint64
fun:test_atomic_ops
fun:ExecInterpExpr

Now, this is basically the same thing as is already memorialized in
src/tools/valgrind.supp:

# Atomic writes to 64bit atomic vars uses compare/exchange to
# guarantee atomic writes of 64bit variables. pg_atomic_write is used
# during initialization of the atomic variable; that leads to an
# initial read of the old, undefined, memory value. But that's just to
# make sure the swap works correctly.
{
uninitialized_atomic_init_u64
Memcheck:Cond
fun:pg_atomic_exchange_u64_impl
fun:pg_atomic_write_u64_impl
fun:pg_atomic_init_u64_impl
}

so my first thought was that we just needed an architecture-specific
variant of that. But on thinking more about this, it seems like
generic.h's version of pg_atomic_init_u64_impl is just fundamentally
misguided. Why isn't it simply assigning the value with an ordinary
unlocked write? By definition, we had better not be using this function
in any circumstance where there might be conflicting accesses, so I don't
see why we should need to tolerate a valgrind exception here. Moreover,
if a simple assignment *isn't* good enough, then surely the spinlock
version in atomics.c is 100% broken.

regards, tom lane

Andres Freund

andres@anarazel.de

over 5 years ago

In reply to: Tom Lane (#1)

Re: valgrind versus pg_atomic_init()

Hi,

On 2020-06-07 00:23:35 -0400, Tom Lane wrote:

I experimented with running "make check" on ARM64 under a reasonably
bleeding-edge valgrind (3.16.0). One thing I ran into is that
regress.c's test_atomic_ops fails; valgrind shows the stack trace

fun:__aarch64_cas8_acq_rel
fun:pg_atomic_compare_exchange_u64_impl
fun:pg_atomic_exchange_u64_impl
fun:pg_atomic_write_u64_impl
fun:pg_atomic_init_u64_impl
fun:pg_atomic_init_u64
fun:test_atomic_uint64
fun:test_atomic_ops
fun:ExecInterpExpr

Now, this is basically the same thing as is already memorialized in
src/tools/valgrind.supp:

# Atomic writes to 64bit atomic vars uses compare/exchange to
# guarantee atomic writes of 64bit variables. pg_atomic_write is used
# during initialization of the atomic variable; that leads to an
# initial read of the old, undefined, memory value. But that's just to
# make sure the swap works correctly.
{
uninitialized_atomic_init_u64
Memcheck:Cond
fun:pg_atomic_exchange_u64_impl
fun:pg_atomic_write_u64_impl
fun:pg_atomic_init_u64_impl
}

so my first thought was that we just needed an architecture-specific
variant of that. But on thinking more about this, it seems like
generic.h's version of pg_atomic_init_u64_impl is just fundamentally
misguided. Why isn't it simply assigning the value with an ordinary
unlocked write? By definition, we had better not be using this function
in any circumstance where there might be conflicting accesses, so I don't
see why we should need to tolerate a valgrind exception here. Moreover,
if a simple assignment *isn't* good enough, then surely the spinlock
version in atomics.c is 100% broken.

Yea, it could just do that. It seemed slightly easier/clearer, back when
I wrote it, to just use pg_atomic_write* for the initialization, but
this seems enough of a reason to stop doing so. Will change it in all
branches, unless somebody sees a reason to not do so?

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Andres Freund (#2)

Re: valgrind versus pg_atomic_init()

Andres Freund <andres@anarazel.de> writes:

On 2020-06-07 00:23:35 -0400, Tom Lane wrote:

so my first thought was that we just needed an architecture-specific
variant of that. But on thinking more about this, it seems like
generic.h's version of pg_atomic_init_u64_impl is just fundamentally
misguided. Why isn't it simply assigning the value with an ordinary
unlocked write? By definition, we had better not be using this function
in any circumstance where there might be conflicting accesses, so I don't
see why we should need to tolerate a valgrind exception here. Moreover,
if a simple assignment *isn't* good enough, then surely the spinlock
version in atomics.c is 100% broken.

Yea, it could just do that. It seemed slightly easier/clearer, back when
I wrote it, to just use pg_atomic_write* for the initialization, but
this seems enough of a reason to stop doing so. Will change it in all
branches, unless somebody sees a reason to not do so?

Works for me.

regards, tom lane

Andres Freund

andres@anarazel.de

over 5 years ago

In reply to: Tom Lane (#3)

Re: valgrind versus pg_atomic_init()

Hi,

On 2020-06-08 18:21:06 -0400, Tom Lane wrote:

Yea, it could just do that. It seemed slightly easier/clearer, back when
I wrote it, to just use pg_atomic_write* for the initialization, but
this seems enough of a reason to stop doing so. Will change it in all
branches, unless somebody sees a reason to not do so?

Works for me.

And done.

- Andres

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Andres Freund (#4)

Re: valgrind versus pg_atomic_init()

Andres Freund <andres@anarazel.de> writes:

And done.

LGTM, thanks!

regards, tom lane

Noah Misch

noah@leadboat.com

over 5 years ago

In reply to: Tom Lane (#1)

Re: valgrind versus pg_atomic_init()

On Sun, Jun 07, 2020 at 12:23:35AM -0400, Tom Lane wrote:

I experimented with running "make check" on ARM64 under a reasonably
bleeding-edge valgrind (3.16.0). One thing I ran into is that
regress.c's test_atomic_ops fails; valgrind shows the stack trace

fun:__aarch64_cas8_acq_rel
fun:pg_atomic_compare_exchange_u64_impl
fun:pg_atomic_exchange_u64_impl
fun:pg_atomic_write_u64_impl
fun:pg_atomic_init_u64_impl
fun:pg_atomic_init_u64
fun:test_atomic_uint64
fun:test_atomic_ops
fun:ExecInterpExpr

Now, this is basically the same thing as is already memorialized in
src/tools/valgrind.supp:

# Atomic writes to 64bit atomic vars uses compare/exchange to
# guarantee atomic writes of 64bit variables. pg_atomic_write is used
# during initialization of the atomic variable; that leads to an
# initial read of the old, undefined, memory value. But that's just to
# make sure the swap works correctly.
{
uninitialized_atomic_init_u64
Memcheck:Cond
fun:pg_atomic_exchange_u64_impl
fun:pg_atomic_write_u64_impl
fun:pg_atomic_init_u64_impl
}

so my first thought was that we just needed an architecture-specific
variant of that. But on thinking more about this, it seems like
generic.h's version of pg_atomic_init_u64_impl is just fundamentally
misguided. Why isn't it simply assigning the value with an ordinary
unlocked write? By definition, we had better not be using this function
in any circumstance where there might be conflicting accesses

Does something guarantee the write will be globally-visible by the time the
first concurrent accessor shows up? (If not, one could (a) do an unlocked
ptr->value=0, then the atomic write, or (b) revert and improve the
suppression.) I don't doubt it's fine for the ways PostgreSQL uses atomics
today, which generally initialize an atomic before the concurrent-accessor
processes even exist.

, so I don't
see why we should need to tolerate a valgrind exception here. Moreover,
if a simple assignment *isn't* good enough, then surely the spinlock
version in atomics.c is 100% broken.

Are you saying it would imply a bug in atomics.c pg_atomic_init_u64_impl(),
pg_atomic_compare_exchange_u64_impl(), or pg_atomic_fetch_add_u64_impl()? Can
you explain that more? If you were referring to unlocked "*(lock) = 0", that
is different since it's safe to have a delay in propagation of the change from
locked state to unlocked state.

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Noah Misch (#6)

Re: valgrind versus pg_atomic_init()

Noah Misch <noah@leadboat.com> writes:

On Sun, Jun 07, 2020 at 12:23:35AM -0400, Tom Lane wrote:

... But on thinking more about this, it seems like
generic.h's version of pg_atomic_init_u64_impl is just fundamentally
misguided. Why isn't it simply assigning the value with an ordinary
unlocked write? By definition, we had better not be using this function
in any circumstance where there might be conflicting accesses

Does something guarantee the write will be globally-visible by the time the
first concurrent accessor shows up? (If not, one could (a) do an unlocked
ptr->value=0, then the atomic write, or (b) revert and improve the
suppression.) I don't doubt it's fine for the ways PostgreSQL uses atomics
today, which generally initialize an atomic before the concurrent-accessor
processes even exist.

Perhaps it'd be worth putting a memory barrier at the end of the _init
function(s)? As you say, this is hypothetical right now, but that'd be
a cheap improvement.

if a simple assignment *isn't* good enough, then surely the spinlock
version in atomics.c is 100% broken.

Are you saying it would imply a bug in atomics.c pg_atomic_init_u64_impl(),
pg_atomic_compare_exchange_u64_impl(), or pg_atomic_fetch_add_u64_impl()? Can
you explain that more?

My point was that doing SpinLockInit while somebody else is already trying
to acquire or release the spinlock is not going to work out well. So
there has to be a really clear boundary between "initialization" mode
and "use" mode; which is more or less the same point you make above.

In practice, if that line is so fine that we need a memory sync operation
to enforce it, things are probably broken anyhow.

regards, tom lane

Andres Freund

andres@anarazel.de

over 5 years ago

In reply to: Noah Misch (#6)

Re: valgrind versus pg_atomic_init()

Hi,

On 2020-06-14 18:55:27 -0700, Noah Misch wrote:

Does something guarantee the write will be globally-visible by the time the
first concurrent accessor shows up?

The function comments say:

*
* Has to be done before any concurrent usage..
*
* No barrier semantics.

(If not, one could (a) do an unlocked ptr->value=0, then the atomic
write, or (b) revert and improve the suppression.) I don't doubt it's
fine for the ways PostgreSQL uses atomics today, which generally
initialize an atomic before the concurrent-accessor processes even
exist.

I think it's unlikely that there are cases where you could safely
initialize the atomic without needing some form of synchronization
before it can be used. If a barrier were needed, what'd guarantee the
concurrent access happened after the initialization in the first place?

Greetings,

Andres Freund

andres@anarazel.de

over 5 years ago

In reply to: Tom Lane (#7)

Re: valgrind versus pg_atomic_init()

Hi,

On 2020-06-14 22:30:25 -0400, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Sun, Jun 07, 2020 at 12:23:35AM -0400, Tom Lane wrote:

... But on thinking more about this, it seems like
generic.h's version of pg_atomic_init_u64_impl is just fundamentally
misguided. Why isn't it simply assigning the value with an ordinary
unlocked write? By definition, we had better not be using this function
in any circumstance where there might be conflicting accesses

Does something guarantee the write will be globally-visible by the time the
first concurrent accessor shows up? (If not, one could (a) do an unlocked
ptr->value=0, then the atomic write, or (b) revert and improve the
suppression.) I don't doubt it's fine for the ways PostgreSQL uses atomics
today, which generally initialize an atomic before the concurrent-accessor
processes even exist.

Perhaps it'd be worth putting a memory barrier at the end of the _init
function(s)? As you say, this is hypothetical right now, but that'd be
a cheap improvement.

I don't think it'd be that cheap for some cases. There's an atomic for
every shared buffer, making their initialization full memory barriers
would likely be noticable for larger shared_buffers values.

As you say:

In practice, if that line is so fine that we need a memory sync operation
to enforce it, things are probably broken anyhow.

Greetings,

Andres Freund

#10

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Andres Freund (#9)

Re: valgrind versus pg_atomic_init()

Andres Freund <andres@anarazel.de> writes:

On 2020-06-14 22:30:25 -0400, Tom Lane wrote:

Perhaps it'd be worth putting a memory barrier at the end of the _init
function(s)? As you say, this is hypothetical right now, but that'd be
a cheap improvement.

I don't think it'd be that cheap for some cases. There's an atomic for
every shared buffer, making their initialization full memory barriers
would likely be noticable for larger shared_buffers values.

Fair point --- if we did need to do something to make this safer, doing it
at the level of individual atomic values would be the wrong thing anyway.

regards, tom lane

#11

Noah Misch

noah@leadboat.com

over 5 years ago

In reply to: Andres Freund (#8)

Re: valgrind versus pg_atomic_init()

On Sun, Jun 14, 2020 at 09:16:20PM -0700, Andres Freund wrote:

On 2020-06-14 18:55:27 -0700, Noah Misch wrote:

Does something guarantee the write will be globally-visible by the time the
first concurrent accessor shows up?

The function comments say:

*
* Has to be done before any concurrent usage..
*
* No barrier semantics.

(If not, one could (a) do an unlocked ptr->value=0, then the atomic
write, or (b) revert and improve the suppression.) I don't doubt it's
fine for the ways PostgreSQL uses atomics today, which generally
initialize an atomic before the concurrent-accessor processes even
exist.

I think it's unlikely that there are cases where you could safely
initialize the atomic without needing some form of synchronization
before it can be used. If a barrier were needed, what'd guarantee the
concurrent access happened after the initialization in the first place?

Suppose the initializing process does:

pg_atomic_init_u64(&somestruct->atomic, 123);
somestruct->atomic_ready = true;

In released versions, any process observing atomic_ready==true will observe
the results of the pg_atomic_init_u64(). After the commit from this thread,
that's no longer assured. Having said that, I agree with a special case of
Tom's assertion about spinlocks, namely that this has same problem:

/* somestruct->lock already happens to contain value 0 */
SpinLockInit(&somestruct->lock);
somestruct->lock_ready = true;

#12

Andres Freund

andres@anarazel.de

over 5 years ago

In reply to: Noah Misch (#11)

Re: valgrind versus pg_atomic_init()

Hi,

On June 16, 2020 8:24:29 PM PDT, Noah Misch <noah@leadboat.com> wrote:

On Sun, Jun 14, 2020 at 09:16:20PM -0700, Andres Freund wrote:

On 2020-06-14 18:55:27 -0700, Noah Misch wrote:

Does something guarantee the write will be globally-visible by the

time the

first concurrent accessor shows up?

The function comments say:

*
* Has to be done before any concurrent usage..
*
* No barrier semantics.

(If not, one could (a) do an unlocked ptr->value=0, then the atomic
write, or (b) revert and improve the suppression.) I don't doubt

it's

fine for the ways PostgreSQL uses atomics today, which generally
initialize an atomic before the concurrent-accessor processes even
exist.

I think it's unlikely that there are cases where you could safely
initialize the atomic without needing some form of synchronization
before it can be used. If a barrier were needed, what'd guarantee the
concurrent access happened after the initialization in the first

place?

Suppose the initializing process does:

pg_atomic_init_u64(&somestruct->atomic, 123);
somestruct->atomic_ready = true;

In released versions, any process observing atomic_ready==true will
observe
the results of the pg_atomic_init_u64(). After the commit from this
thread,
that's no longer assured.

Why did that hold true before? There wasn't a barrier in platforms already (wherever we know what 64 bit reads/writes have single copy atomicity).

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#13

Tom Lane

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Andres Freund (#12)

Re: valgrind versus pg_atomic_init()

Andres Freund <andres@anarazel.de> writes:

On June 16, 2020 8:24:29 PM PDT, Noah Misch <noah@leadboat.com> wrote:

Suppose the initializing process does:

pg_atomic_init_u64(&somestruct->atomic, 123);
somestruct->atomic_ready = true;

In released versions, any process observing atomic_ready==true will
observe
the results of the pg_atomic_init_u64(). After the commit from this
thread,
that's no longer assured.

Why did that hold true before? There wasn't a barrier in platforms already (wherever we know what 64 bit reads/writes have single copy atomicity).

I'm confused as to why this is even an interesting discussion. If the
timing is so tight that another process could possibly observe partially-
initialized state in shared memory, how could we have confidence that the
other process doesn't look before we've initialized the atomic variable or
spinlock at all?

I think in practice all we need depend on in this area is that fork()
provides a full memory barrier.

regards, tom lane

#14

Noah Misch

noah@leadboat.com

over 5 years ago

In reply to: Andres Freund (#12)

Re: valgrind versus pg_atomic_init()

On Tue, Jun 16, 2020 at 08:35:58PM -0700, Andres Freund wrote:

On June 16, 2020 8:24:29 PM PDT, Noah Misch <noah@leadboat.com> wrote:

Suppose the initializing process does:

pg_atomic_init_u64(&somestruct->atomic, 123);
somestruct->atomic_ready = true;

In released versions, any process observing atomic_ready==true will
observe
the results of the pg_atomic_init_u64(). After the commit from this
thread,
that's no longer assured.

Why did that hold true before? There wasn't a barrier in platforms already (wherever we know what 64 bit reads/writes have single copy atomicity).

You are right. It didn't hold before.