Latches and barriers
Hi,
latch.h has the following comment:
* Presently, when using a shared latch for interprocess signalling, the
* flag variable(s) set by senders and inspected by the wait loop must
* be protected by spinlocks or LWLocks, else it is possible to miss events
* on machines with weak memory ordering (such as PPC). This restriction
* will be lifted in future by inserting suitable memory barriers into
* SetLatch and ResetLatch.
and unix_latch.c has:
SetLatch(volatile Latch *latch)
{
pid_t owner_pid;
/*
* XXX there really ought to be a memory barrier operation right here, to
* ensure that any flag variables we might have changed get flushed to
* main memory before we check/set is_set. Without that, we have to
* require that callers provide their own synchronization for machines
* with weak memory ordering (see latch.h).
*/
/* Quick exit if already set */
if (latch->is_set)
return;
...
void
ResetLatch(volatile Latch *latch)
{
/* Only the owner should reset the latch */
Assert(latch->owner_pid == MyProcPid);
latch->is_set = false;
/*
* XXX there really ought to be a memory barrier operation right here, to
* ensure that the write to is_set gets flushed to main memory before we
* examine any flag variables. Otherwise a concurrent SetLatch might
* falsely conclude that it needn't signal us, even though we have missed
* seeing some flag updates that SetLatch was supposed to inform us of.
* For the moment, callers must supply their own synchronization of flag
* variables (see latch.h).
*/
}
Triggered by another thread I converted proc.c and lwlock.c to use
latches for blocking. Which worked fine on my laptop, but failed
miserably, often within less than a second, on my 2 socket x86
workstation. After a fair amount of headscratching I figured out that
it's indeed those missing barriers. Adding them made it work.
Thinking about it, it's not too surprising. PGPROC's lwWaiting and
procLatch aren't at the same address (more specifically on a different
cacheline). X86 allows reordering of loads with stores to different
addresses. That's what happening here.
While it might not be required for existing latch uses (I'm *not* sure
that's true), I still think that we should fix those XXX by actually
using barriers now that we have them. I don't think we want every
callsite worry about using barriers.
Agreed?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
While it might not be required for existing latch uses (I'm *not* sure
that's true), I still think that we should fix those XXX by actually
using barriers now that we have them. I don't think we want every
callsite worry about using barriers.
Agreed?
Yeah, now that we have barrier code we think works, we should definitely
put some in there. The only reason it's like that is we didn't have
any real barrier support at the time.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-01-12 11:03:42 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
While it might not be required for existing latch uses (I'm *not* sure
that's true)
I think at least syncrep.c might not be correct. In SyncRepWakeQueue()
it sets PGPROC->syncRepState without the necessary barriers (via locks),
although it does use them in SyncRepWaitForLSN().
It is, perhaps surprisingly to many, not sufficient to take a spinlock,
change the flag, release it and then set the latch - the release alone
doesn't guarantee a sufficient barrier unless looking at the flag is
also protected by the spinlock.
I still think that we should fix those XXX by actually
using barriers now that we have them. I don't think we want every
callsite worry about using barriers.Agreed?
Yeah, now that we have barrier code we think works, we should definitely
put some in there. The only reason it's like that is we didn't have
any real barrier support at the time.
Master only though? If we decide we need it earlier, we can backport
that commit lateron...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 12, 2015 at 11:27 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-01-12 11:03:42 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
While it might not be required for existing latch uses (I'm *not* sure
that's true)I think at least syncrep.c might not be correct. In SyncRepWakeQueue()
it sets PGPROC->syncRepState without the necessary barriers (via locks),
although it does use them in SyncRepWaitForLSN().It is, perhaps surprisingly to many, not sufficient to take a spinlock,
change the flag, release it and then set the latch - the release alone
doesn't guarantee a sufficient barrier unless looking at the flag is
also protected by the spinlock.
I thought we decided that a spinlock acquire or release should be a
full barrier.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2015-01-12 11:03:42 -0500, Tom Lane wrote:
Yeah, now that we have barrier code we think works, we should definitely
put some in there. The only reason it's like that is we didn't have
any real barrier support at the time.
Master only though? If we decide we need it earlier, we can backport
that commit lateron...
We've not been back-patching barrier fixes have we? I'm hesitant to
put any of that stuff into released branches until it's had more time
to age.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-01-12 12:44:56 -0500, Robert Haas wrote:
On Mon, Jan 12, 2015 at 11:27 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2015-01-12 11:03:42 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
While it might not be required for existing latch uses (I'm *not* sure
that's true)I think at least syncrep.c might not be correct. In SyncRepWakeQueue()
it sets PGPROC->syncRepState without the necessary barriers (via locks),
although it does use them in SyncRepWaitForLSN().It is, perhaps surprisingly to many, not sufficient to take a spinlock,
change the flag, release it and then set the latch - the release alone
doesn't guarantee a sufficient barrier unless looking at the flag is
also protected by the spinlock.I thought we decided that a spinlock acquire or release should be a
full barrier.
Acquire + release, yes. But not release alone? The x86 spinlock release
currently is just a store - that won't ever be a full barrier.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-01-12 12:49:53 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2015-01-12 11:03:42 -0500, Tom Lane wrote:
Yeah, now that we have barrier code we think works, we should definitely
put some in there. The only reason it's like that is we didn't have
any real barrier support at the time.Master only though? If we decide we need it earlier, we can backport
that commit lateron...We've not been back-patching barrier fixes have we?
Not fully at least. And I'm not sure that's good, because at least
bgworker.c has relied on working barriers for a while and 9.4 introduced
a couple more uses.
I guess we should fix up the backbranch versions in a while.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers