Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Hi,
During the lwlock scalability work I noticed a longstanding issue with
the lwlock code. LWLockRelease() and the other mentioned locations do
the following to wake up any waiters, without holding the lock's
spinlock:
/*
* Awaken any waiters I removed from the queue.
*/
while (head != NULL)
{
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
proc->lwWaiting = false;
PGSemaphoreUnlock(&proc->sem);
}
which means they manipulate the lwWaitLink queue without
protection. That's done intentionally. The code tries to protect against
corruption of the list to do a woken up backend acquiring a lock (this
or an independent one) by only continuing when the lwWaiting flag is set
to false. Unfortunately there's absolutely no guarantee that a) the
assignment to lwWaitLink and lwWaiting are done in that order b) that
the stores are done in-order from the POV of other backends.
So what we need to do is to acquire a write barrier between the
assignments to lwWaitLink and lwWaiting, i.e.
proc->lwWaitLink = NULL;
pg_write_barrier();
proc->lwWaiting = false;
the reader side already uses an implicit barrier by using spinlocks.
I've fixed this as part 1 of the lwlock scalability work in [1]http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=commitdiff;h=2de11eb11bb3e3777f6d384de0af9c2f77960637, but
Heikki rightfully suggested that a) this should be backpatched b) done
in a separate commit.
There is the question what to do about the branches without barriers? I
guess a SpinLockAcquire()/Release() would do? Or do we decide it's not
important enough to matter, since it's not an issue on x86?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
So what we need to do is to acquire a write barrier between the
assignments to lwWaitLink and lwWaiting, i.e.
proc->lwWaitLink = NULL;
pg_write_barrier();
proc->lwWaiting = false;
You didn't really explain why you think that ordering is necessary?
Each proc being awoken will surely see both fields updated, and other
procs won't be examining these fields at all, since we already delinked
all these procs from the LWLock's queue.
There is the question what to do about the branches without barriers? I
guess a SpinLockAcquire()/Release() would do? Or do we decide it's not
important enough to matter, since it's not an issue on x86?
Given the lack of trouble reports that could be traced to this,
I don't feel a need to worry about it in branches that don't
have any barrier support. But in any case, I'm not convinced
there's a bug here at all.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
You didn't really explain why you think that ordering is necessary?
Actually, after grepping to check my memory of what those fields are
being used for, I have a bigger question: WTF is xlog.c doing being
so friendly with the innards of LWLocks? Surely this needs to get
refactored so that most of WakeupWaiters() and friends is in lwlock.c.
Or has all notion of modularity gone out the window while I wasn't
looking?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-02-10 11:11:28 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
So what we need to do is to acquire a write barrier between the
assignments to lwWaitLink and lwWaiting, i.e.
proc->lwWaitLink = NULL;
pg_write_barrier();
proc->lwWaiting = false;You didn't really explain why you think that ordering is necessary?
Each proc being awoken will surely see both fields updated, and other
procs won't be examining these fields at all, since we already delinked
all these procs from the LWLock's queue.
The problem is that one the released backends could wake up concurrently
because of a unrelated, or previous PGSemaphoreUnlock(). It could see
lwWaiting = false, and thus wakeup and acquire the lock, even if the
store for lwWaitLink hasn't arrived (or performed, there's no guaranteed
ordering here) yet.
Now, it may well be that there's no practical consequence of that, but I
am not prepared to bet on it.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-02-10 11:20:30 -0500, Tom Lane wrote:
I wrote:
You didn't really explain why you think that ordering is necessary?
Actually, after grepping to check my memory of what those fields are
being used for, I have a bigger question: WTF is xlog.c doing being
so friendly with the innards of LWLocks? Surely this needs to get
refactored so that most of WakeupWaiters() and friends is in lwlock.c.
Or has all notion of modularity gone out the window while I wasn't
looking?
Well, it's not actually using any lwlock.c code, it's a special case
locking logic, just reusing the datastructures. That said, I am not
particularly happy about the amount of code it's duplicating from
lwlock.c. Pretty much all of WALInsertSlotReleaseOne and most of
WALInsertSlotAcquireOne() is a copied.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 02/10/2014 06:41 PM, Andres Freund wrote:
On 2014-02-10 11:20:30 -0500, Tom Lane wrote:
I wrote:
You didn't really explain why you think that ordering is necessary?
Actually, after grepping to check my memory of what those fields are
being used for, I have a bigger question: WTF is xlog.c doing being
so friendly with the innards of LWLocks? Surely this needs to get
refactored so that most of WakeupWaiters() and friends is in lwlock.c.
Or has all notion of modularity gone out the window while I wasn't
looking?Well, it's not actually using any lwlock.c code, it's a special case
locking logic, just reusing the datastructures. That said, I am not
particularly happy about the amount of code it's duplicating from
lwlock.c. Pretty much all of WALInsertSlotReleaseOne and most of
WALInsertSlotAcquireOne() is a copied.
I'm not too happy with the amount of copy-paste myself, but there was
enough difference to regular lwlocks that I didn't want to bother all
lwlocks with the xlog-specific stuff either. The WAL insert slots do
share the LWLock-related PGPROC fields though, and semaphore. I'm all
ears if you have ideas on that..
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
On 02/10/2014 06:41 PM, Andres Freund wrote:
Well, it's not actually using any lwlock.c code, it's a special case
locking logic, just reusing the datastructures. That said, I am not
particularly happy about the amount of code it's duplicating from
lwlock.c. Pretty much all of WALInsertSlotReleaseOne and most of
WALInsertSlotAcquireOne() is a copied.
I'm not too happy with the amount of copy-paste myself, but there was
enough difference to regular lwlocks that I didn't want to bother all
lwlocks with the xlog-specific stuff either. The WAL insert slots do
share the LWLock-related PGPROC fields though, and semaphore. I'm all
ears if you have ideas on that..
I agree that if the behavior is considerably different, we don't really
want to try to make LWLockAcquire/Release cater to both this and their
standard behavior. But why not add some additional functions in lwlock.c
that do what xlog wants? If we're going to have mostly-copy-pasted logic,
it'd at least be better if it was in the same file, and not somewhere
that's not even in the same major subtree.
Also, the reason that LWLock isn't an abstract struct is because we wanted
to be able to embed it in other structs; *not* as license for other
modules to fool with its contents. If we were working in C++ we'd
certainly have made all its fields private.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 02/10/2014 03:46 PM, Andres Freund wrote:
Hi,
During the lwlock scalability work I noticed a longstanding issue with
the lwlock code. LWLockRelease() and the other mentioned locations do
the following to wake up any waiters, without holding the lock's
spinlock:
/*
* Awaken any waiters I removed from the queue.
*/
while (head != NULL)
{
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
proc->lwWaiting = false;
PGSemaphoreUnlock(&proc->sem);
}which means they manipulate the lwWaitLink queue without
protection. That's done intentionally. The code tries to protect against
corruption of the list to do a woken up backend acquiring a lock (this
or an independent one) by only continuing when the lwWaiting flag is set
to false. Unfortunately there's absolutely no guarantee that a) the
assignment to lwWaitLink and lwWaiting are done in that order b) that
the stores are done in-order from the POV of other backends.
So what we need to do is to acquire a write barrier between the
assignments to lwWaitLink and lwWaiting, i.e.
proc->lwWaitLink = NULL;
pg_write_barrier();
proc->lwWaiting = false;
the reader side already uses an implicit barrier by using spinlocks.I've fixed this as part 1 of the lwlock scalability work in [1], but
Heikki rightfully suggested that a) this should be backpatched b) done
in a separate commit.There is the question what to do about the branches without barriers? I
guess a SpinLockAcquire()/Release() would do?
The other alternative we discussed on IM is to unlink the waiters from
the linked list, while still holding the spinlock. We can't do the
PGSemaphoreUnlock() call to actually wake up the waiters while holding
the spinlock, but all the shared memory manipulations we could. It would
move all the modifications of the shared structures under the spinlock,
which feels comforting.
It would require using some-sort of a backend-private data structure to
hold the list of processes to wake up. We don't want to palloc() in
LWLockRelease(), but we could malloc() a large-enough array once at
process initialization, and use that on all LWLockRelease() calls.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 02/10/2014 08:03 PM, Tom Lane wrote:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
On 02/10/2014 06:41 PM, Andres Freund wrote:
Well, it's not actually using any lwlock.c code, it's a special case
locking logic, just reusing the datastructures. That said, I am not
particularly happy about the amount of code it's duplicating from
lwlock.c. Pretty much all of WALInsertSlotReleaseOne and most of
WALInsertSlotAcquireOne() is a copied.I'm not too happy with the amount of copy-paste myself, but there was
enough difference to regular lwlocks that I didn't want to bother all
lwlocks with the xlog-specific stuff either. The WAL insert slots do
share the LWLock-related PGPROC fields though, and semaphore. I'm all
ears if you have ideas on that..I agree that if the behavior is considerably different, we don't really
want to try to make LWLockAcquire/Release cater to both this and their
standard behavior. But why not add some additional functions in lwlock.c
that do what xlog wants? If we're going to have mostly-copy-pasted logic,
it'd at least be better if it was in the same file, and not somewhere
that's not even in the same major subtree.
Ok, I'll try to refactor it that way, so that we can see if it looks better.
Also, the reason that LWLock isn't an abstract struct is because we wanted
to be able to embed it in other structs; *not* as license for other
modules to fool with its contents. If we were working in C++ we'd
certainly have made all its fields private.
Um, xlog.c is doing no such thing. The insertion slots use a struct of
their own, which is also copy-pasted from LWLock (with one additional
field).
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-02-10 19:48:47 +0200, Heikki Linnakangas wrote:
On 02/10/2014 06:41 PM, Andres Freund wrote:
On 2014-02-10 11:20:30 -0500, Tom Lane wrote:
I wrote:
You didn't really explain why you think that ordering is necessary?
Actually, after grepping to check my memory of what those fields are
being used for, I have a bigger question: WTF is xlog.c doing being
so friendly with the innards of LWLocks? Surely this needs to get
refactored so that most of WakeupWaiters() and friends is in lwlock.c.
Or has all notion of modularity gone out the window while I wasn't
looking?Well, it's not actually using any lwlock.c code, it's a special case
locking logic, just reusing the datastructures. That said, I am not
particularly happy about the amount of code it's duplicating from
lwlock.c. Pretty much all of WALInsertSlotReleaseOne and most of
WALInsertSlotAcquireOne() is a copied.I'm not too happy with the amount of copy-paste myself, but there was enough
difference to regular lwlocks that I didn't want to bother all lwlocks with
the xlog-specific stuff either. The WAL insert slots do share the
LWLock-related PGPROC fields though, and semaphore. I'm all ears if you have
ideas on that..
The lwlock scalability stuff has separated out the enqueue/wakeup code,
that probably should work here as well? And that's a fair portion of the
code. I think it should be doable to make that generic enough that the
actual difference of the struct doesn't matter. It'd also reduce
duplication of LWLockAcquire, ConditionalAcquire, OrWait.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Andres Freund" <andres@2ndquadrant.com>
which means they manipulate the lwWaitLink queue without
protection. That's done intentionally. The code tries to protect against
corruption of the list to do a woken up backend acquiring a lock (this
or an independent one) by only continuing when the lwWaiting flag is set
to false. Unfortunately there's absolutely no guarantee that a) the
assignment to lwWaitLink and lwWaiting are done in that order b) that
the stores are done in-order from the POV of other backends.
So what we need to do is to acquire a write barrier between the
assignments to lwWaitLink and lwWaiting, i.e.
proc->lwWaitLink = NULL;
pg_write_barrier();
proc->lwWaiting = false;
the reader side already uses an implicit barrier by using spinlocks.
I've got a report from one customer that they encountered a hang during
performance benchmarking. They were using PostgreSQL 9.2.4. I remember
that the stack trace showed many backends blocked forever at LWLockAcuuire()
during btree insert operation. I'm not sure this has something to do with
what you are raising, but the release notes for 9.2.5/6 doesn't suggest any
fixes for this. So I felt there is something wrong with lwlocks.
Do you think that your question could cause my customer's problem --
backends block at lwlock forever?
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-02-11 21:46:04 +0900, MauMau wrote:
From: "Andres Freund" <andres@2ndquadrant.com>
which means they manipulate the lwWaitLink queue without
protection. That's done intentionally. The code tries to protect against
corruption of the list to do a woken up backend acquiring a lock (this
or an independent one) by only continuing when the lwWaiting flag is set
to false. Unfortunately there's absolutely no guarantee that a) the
assignment to lwWaitLink and lwWaiting are done in that order b) that
the stores are done in-order from the POV of other backends.
So what we need to do is to acquire a write barrier between the
assignments to lwWaitLink and lwWaiting, i.e.
proc->lwWaitLink = NULL;
pg_write_barrier();
proc->lwWaiting = false;
the reader side already uses an implicit barrier by using spinlocks.I've got a report from one customer that they encountered a hang during
performance benchmarking. They were using PostgreSQL 9.2.4. I remember
that the stack trace showed many backends blocked forever at LWLockAcuuire()
during btree insert operation. I'm not sure this has something to do with
what you are raising, but the release notes for 9.2.5/6 doesn't suggest any
fixes for this. So I felt there is something wrong with lwlocks.Do you think that your question could cause my customer's problem --
backends block at lwlock forever?
It's x86, right? Then it's unlikely to be actual unordered memory
accesses, but if the compiler reordered:
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
proc->lwWaiting = false;
PGSemaphoreUnlock(&proc->sem);
to
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
proc->lwWaiting = false;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
PGSemaphoreUnlock(&proc->sem);
which it is permitted to do, yes, that could cause symptoms like you
describe.
Any chance you have the binaries the customer ran back then around?
Disassembling that piece of code might give you a hint whether that's a
possible cause.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Andres Freund" <andres@2ndquadrant.com>
It's x86, right? Then it's unlikely to be actual unordered memory
accesses, but if the compiler reordered:
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
proc->lwWaiting = false;
PGSemaphoreUnlock(&proc->sem);
to
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
proc->lwWaiting = false;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
PGSemaphoreUnlock(&proc->sem);
which it is permitted to do, yes, that could cause symptoms like you
describe.
Yes, the hang occurred with 64-bit PostgreSQL 9.2.4 running on RHEL6 for
x86_64. The PostgreSQL was built with GCC.
Any chance you have the binaries the customer ran back then around?
Disassembling that piece of code might give you a hint whether that's a
possible cause.
I'm sorry I can't provide the module, but I attached the disassembled code
code for lwlockRelease and LWLockAcquire in the executable. I'm not sure
this proves something.
FYI, the following stack traces are the ones obtained during two instances
of hang.
#0 0x00000036102eaf77 in semop () from /lib64/libc.so.6
#1 0x0000000000614707 in PGSemaphoreLock ()
#2 0x0000000000659d5b in LWLockAcquire ()
#3 0x000000000047983d in RelationGetBufferForTuple ()
#4 0x0000000000477f86 in heap_insert ()
#5 0x00000000005a4a12 in ExecModifyTable ()
#6 0x000000000058d928 in ExecProcNode ()
#7 0x000000000058c762 in standard_ExecutorRun ()
#8 0x00007f0cb37f99cb in pgss_ExecutorRun () from
/opt/symfoserver64/lib/pg_stat_statements.so
#9 0x00007f0cb357f545 in explain_ExecutorRun () from
/opt/symfoserver64/lib/auto_explain.so
#10 0x000000000066a59e in ProcessQuery ()
#11 0x000000000066a7ef in PortalRunMulti ()
#12 0x000000000066afd2 in PortalRun ()
#13 0x0000000000666fcb in exec_simple_query ()
#14 0x0000000000668058 in PostgresMain ()
#15 0x0000000000622ef1 in PostmasterMain ()
#16 0x00000000005c0723 in main ()
#0 0x00000036102eaf77 in semop () from /lib64/libc.so.6
#1 0x0000000000614707 in PGSemaphoreLock ()
#2 0x0000000000659d5b in LWLockAcquire ()
#3 0x000000000047983d in RelationGetBufferForTuple ()
#4 0x0000000000477f86 in heap_insert ()
#5 0x00000000005a4a12 in ExecModifyTable ()
#6 0x000000000058d928 in ExecProcNode ()
#7 0x000000000058c762 in standard_ExecutorRun ()
#8 0x00007f0cb37f99cb in pgss_ExecutorRun () from
/opt/symfoserver64/lib/pg_stat_statements.so
#9 0x00007f0cb357f545 in explain_ExecutorRun () from
/opt/symfoserver64/lib/auto_explain.so
#10 0x000000000066a59e in ProcessQuery ()
#11 0x000000000066a7ef in PortalRunMulti ()
#12 0x000000000066afd2 in PortalRun ()
#13 0x0000000000666fcb in exec_simple_query ()
#14 0x0000000000668058 in PostgresMain ()
#15 0x0000000000622ef1 in PostmasterMain ()
#16 0x00000000005c0723 in main ()
#0 0x00000036102eaf77 in semop () from /lib64/libc.so.6
#1 0x0000000000614707 in PGSemaphoreLock ()
#2 0x0000000000659d5b in LWLockAcquire ()
#3 0x000000000064bb8c in ProcArrayEndTransaction ()
#4 0x0000000000491216 in CommitTransaction ()
#5 0x00000000004925a5 in CommitTransactionCommand ()
#6 0x0000000000664cf7 in finish_xact_command ()
#7 0x0000000000667145 in exec_simple_query ()
#8 0x0000000000668058 in PostgresMain ()
#9 0x0000000000622ef1 in PostmasterMain ()
#10 0x00000000005c0723 in main ()
Regards
MauMau
Attachments:
lwlock_disassemble.txttext/plain; format=flowed; name=lwlock_disassemble.txt; reply-type=originalDownload
On 2014-02-12 20:55:32 +0900, MauMau wrote:
Dump of assembler code for function LWLockRelease:
could you try if you get more readable dumps by using disassemble/m?
That might at least print line numbers if you have debug info installed.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: "Andres Freund" <andres@2ndquadrant.com>
could you try if you get more readable dumps by using disassemble/m?
That might at least print line numbers if you have debug info installed.
OK, I'll try that tomorrow. However, the debug info is not available,
because they use PostgreSQL built by themselves, not the community RPM nor
EnterpriseDB's installer.
Regards
MauMau
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Feb12, 2014, at 12:55 , MauMau <maumau307@gmail.com> wrote:
From: "Andres Freund" <andres@2ndquadrant.com>
It's x86, right? Then it's unlikely to be actual unordered memory
accesses, but if the compiler reordered:
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
proc->lwWaiting = false;
PGSemaphoreUnlock(&proc->sem);
to
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
proc->lwWaiting = false;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
PGSemaphoreUnlock(&proc->sem);
which it is permitted to do, yes, that could cause symptoms like you
describe.Yes, the hang occurred with 64-bit PostgreSQL 9.2.4 running on RHEL6 for x86_64.
The PostgreSQL was built with GCC.
The relevant part of the disassembled binary you attached seems to be
Dump of assembler code for function LWLockRelease:
...
0x0000000000647f47 <LWLockRelease+519>: lea 0x10(%rcx),%rdi
0x0000000000647f4b <LWLockRelease+523>: movq $0x0,0x48(%rcx)
0x0000000000647f53 <LWLockRelease+531>: movb $0x0,0x41(%rcx)
0x0000000000647f57 <LWLockRelease+535>: callq 0x606210 <PGSemaphoreUnlock>
I haven't checked the offsets, but since lwWaitLink is an 8-byte quantity
and lwWaiting a single-byte quantity, it's pretty much certain that the
first store updates lwWaitLink and the second lwWaiting. Thus, no reordering
seems to have taken place here...
best regards,
Florian Pflug
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 02/12/2014 05:42 PM, Florian Pflug wrote:
On Feb12, 2014, at 12:55 , MauMau <maumau307@gmail.com> wrote:
From: "Andres Freund" <andres@2ndquadrant.com>
It's x86, right? Then it's unlikely to be actual unordered memory
accesses, but if the compiler reordered:
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
proc->lwWaiting = false;
PGSemaphoreUnlock(&proc->sem);
to
LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");
proc = head;
proc->lwWaiting = false;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
PGSemaphoreUnlock(&proc->sem);
which it is permitted to do, yes, that could cause symptoms like you
describe.Yes, the hang occurred with 64-bit PostgreSQL 9.2.4 running on RHEL6 for x86_64.
The PostgreSQL was built with GCC.The relevant part of the disassembled binary you attached seems to be
Dump of assembler code for function LWLockRelease:
...
0x0000000000647f47 <LWLockRelease+519>: lea 0x10(%rcx),%rdi
0x0000000000647f4b <LWLockRelease+523>: movq $0x0,0x48(%rcx)
0x0000000000647f53 <LWLockRelease+531>: movb $0x0,0x41(%rcx)
0x0000000000647f57 <LWLockRelease+535>: callq 0x606210 <PGSemaphoreUnlock>I haven't checked the offsets, but since lwWaitLink is an 8-byte quantity
and lwWaiting a single-byte quantity, it's pretty much certain that the
first store updates lwWaitLink and the second lwWaiting. Thus, no reordering
seems to have taken place here...best regards,
Florian Pflug
Even if reordering was not done by compiler, it still can happen while execution.
There is no warranty that two subsequent assignments will be observed by all CPU cores in the same order.
So if one thread is performing
p->x = 1;
p->y = 2;
p->x = 3;
p->y = 4;
then other thread can see the following combinations of (x,y):
(1,2)
(1,4)
(3,2)
(3,4)
It is necessary to explicitly insert write barrier to prevent such non-deterministic behaviour.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Feb 12, 2014 at 4:04 PM, knizhnik <knizhnik@garret.ru> wrote:
Even if reordering was not done by compiler, it still can happen while
execution.
There is no warranty that two subsequent assignments will be observed by all
CPU cores in the same order.
The x86 memory model (total store order) provides that guarantee in
this specific case.
Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
eFrom: "Andres Freund" <andres@2ndquadrant.com>
could you try if you get more readable dumps by using disassemble/m?
That might at least print line numbers if you have debug info installed.
Please find the attached file. I hope this will reveal something.
Regards
MauMau
Attachments:
lwlock_disassemble.txttext/plain; format=flowed; name=lwlock_disassemble.txt; reply-type=originalDownload
On 02/12/2014 06:10 PM, Ants Aasma wrote:
On Wed, Feb 12, 2014 at 4:04 PM, knizhnik <knizhnik@garret.ru> wrote:
Even if reordering was not done by compiler, it still can happen while
execution.
There is no warranty that two subsequent assignments will be observed by all
CPU cores in the same order.The x86 memory model (total store order) provides that guarantee in
this specific case.Regards,
Ants Aasma
Sorry, I thought that we are talking about general case, not just x86 architecture.
May be I do not understand something in LWlock code, but it seems to me that assigning NULL to proc->lwWaitLink is not needed at all:
while (head != NULL)
{
LOG_LWDEBUG("LWLockRelease", lockid, "release waiter");
proc = head;
head = proc->lwWaitLink;
proc->lwWaitLink = NULL;
proc->lwWaiting = false;
PGSemaphoreUnlock(&proc->sem);
}
This part of L1 list is not traversed by any other processor. So nobody will inspect this field. When awakened process needs to wait for another lock,
it will just assign NULL to this field itself:
proc->lwWaiting = 1;
proc->lwWaitMode = mode;
proc->lwWaitLink = NULL;
if (lock->head == NULL)
lock->head = proc;
else
lock->tail->lwWaitLink = proc;
lock->tail = proc;
Without TSO (total store order), such assignment of lwWaitLink in LWLockRlease outside critical section may just corrupt L1 list, in which awakened process is already linked.
But I am not sure that elimination of this assignment will be enough to ensure correctness of this code without TSO.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers