LWLock deadlock and gdb advice
I have a 9.5alpha1 cluster which is locked up. All the user back ends seem
to be waiting on semop, eventually on WALInsertLockAcquire.
Is there a way to use gdb to figure out who holds the lock they are waiting
for?
It is compiled with both debug and cassert.
I am hoping someone can give me recipe similar to the one for Examining
backend memory use in https://wiki.postgresql.org/wiki/Developer_FAQ
example backtrace:
#0 0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1 0x000000000067190f in PGSemaphoreLock (sema=0x7f28a98b9468) at
pg_sema.c:387
#2 0x00000000006d4b0c in LWLockAcquireCommon (l=0x7f28a0e6d600,
valptr=0x7f28a0e6d618, val=0) at lwlock.c:1042
#3 LWLockAcquireWithVar (l=0x7f28a0e6d600, valptr=0x7f28a0e6d618, val=0)
at lwlock.c:916
#4 0x00000000004f3c4f in WALInsertLockAcquire (rdata=0xc5c130, fpw_lsn=0)
at xlog.c:1411
#5 XLogInsertRecord (rdata=0xc5c130, fpw_lsn=0) at xlog.c:948
#6 0x00000000004f7aac in XLogInsert (rmid=13 '\r', info=32 ' ') at
xloginsert.c:453
#7 0x000000000047e0b0 in ginPlaceToPage (btree=0x7fffca9263e0,
stack=0x2c94ff8, insertdata=<value optimized out>, updateblkno=<value
optimized out>, childbuf=0, buildStats=0x0)
at ginbtree.c:418
#8 0x000000000047f3ad in ginInsertValue (btree=0x7fffca9263e0,
stack=0x2c94ff8, insertdata=0x7fffca926460, buildStats=0x0) at
ginbtree.c:748
#9 0x0000000000475c8b in ginEntryInsert (ginstate=0x7fffca9267e0,
attnum=29784, key=1, category=<value optimized out>, items=0x7f28a0c7b458,
nitem=47, buildStats=0x0)
at gininsert.c:234
#10 0x0000000000485ecc in ginInsertCleanup (ginstate=0x7fffca9267e0,
vac_delay=<value optimized out>, stats=0x0) at ginfast.c:843
#11 0x0000000000487059 in ginHeapTupleFastInsert (ginstate=0x7fffca9267e0,
collector=<value optimized out>) at ginfast.c:436
#12 0x00000000004760fa in gininsert (fcinfo=<value optimized out>) at
gininsert.c:531
Cheers,
Jeff
On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Is there a way to use gdb to figure out who holds the lock they are waiting
for?
Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.
Otherwise, I suggest dereferencing the "l" argument to
LWLockAcquireWithVar() or something -- set the frame to "3" in your
example backtrace, using "f 3". Then, "p *l". You'll get the tranche
id there, and so on.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 30, 2015 at 6:25 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Is there a way to use gdb to figure out who holds the lock they are
waiting
for?
Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.
If you define LOCK_DEBUG, then you can check owner of the
lock [1]#ifdef LOCK_DEBUG struct PGPROC *owner; /* last exlusive owner of the lock */ #endif } LWLock;, which will tell you about the Exclusive owner of that lock
and can help you in debugging the problem.
[1]: #ifdef LOCK_DEBUG struct PGPROC *owner; /* last exlusive owner of the lock */ #endif } LWLock;
#ifdef LOCK_DEBUG
struct PGPROC *owner; /* last exlusive owner of the lock */
#endif
} LWLock;
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Is there a way to use gdb to figure out who holds the lock they are
waiting
for?
Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.
I hadn't planned on running into this problem, so had not compiled
accordingly.
I thought LOCK_DEBUG would produce too much output, but now I see it
doesn't print anything unless Trace_lwlocks is on (but it still makes more
info available to gdb, as Amit mentioned), so I think that should be OK.
Since I messed up the gdb session causing the postmaster to SIGKILL all the
children and destroy the evidence, I'll repeat the run compiled with
LOCK_DEBUG and see what it looks like in the morning.
Thanks,
Jeff
On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Is there a way to use gdb to figure out who holds the lock they are
waiting
for?
Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.I hadn't planned on running into this problem, so had not compiled
accordingly.I thought LOCK_DEBUG would produce too much output, but now I see it
doesn't print anything unless Trace_lwlocks is on (but it still makes more
info available to gdb, as Amit mentioned), so I think that should be OK.Since I messed up the gdb session causing the postmaster to SIGKILL all
the children and destroy the evidence, I'll repeat the run compiled with
LOCK_DEBUG and see what it looks like in the morning.
I've gotten the LWLock deadlock again. User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)
The Checkpoint process, meanwhile, has acquired WALInsertLocks 0 through 6
and is blocked on 7.
I'm not sure where to go from here.
24841:
(gdb) f 2
(gdb) p held_lwlocks
$1 = {{lock = 0x7f70e11fae40, mode = LW_EXCLUSIVE}, {lock = 0x7f70e13df080,
mode = LW_EXCLUSIVE}, {lock = 0x7f70e11d4280, mode = LW_EXCLUSIVE}, {lock =
0x7f70e11d4280,
mode = LW_EXCLUSIVE}, {lock = 0x0, mode = LW_EXCLUSIVE} <repeats 196
times>}
(gdb) p T_NAME((LWLock *) held_lwlocks[1])
$2 = 0x84cee1 "WALInsertLocks"
(gdb) p T_ID((LWLock *) held_lwlocks[1])
(gdb) p lock
$3 = (LWLock *) 0x7f70e13df000
(gdb) p T_NAME((LWLock *) lock)
$7 = 0x84cee1 "WALInsertLocks"
(gdb) p T_ID((LWLock *) lock)
$8 = 6
(gdb) p *(lock)
$1 = {mutex = 0 '\000', tranche = 1, state = {value = 1627389952}, nwaiters
= {value = 2}, waiters = {head = {prev = 0x7f70e9e29e58, next =
0x7f70e9e2a140}},
owner = 0x7f70e9e2d260}
(gdb) p *(lock->owner)
$5 = {links = {prev = 0x0, next = 0x0}, sem = {semId = 539754537, semNum =
0}, waitStatus = 0, procLatch = {is_set = 0, is_shared = 1 '\001',
owner_pid = 24820}, lxid = 0,
pid = 24820, pgprocno = 112, backendId = -1, databaseId = 0, roleId = 0,
recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode = 0
'\000', lwWaitLink = {
prev = 0x7f70e13df090, next = 0x7f70e13df090}, waitLock = 0x0,
waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0,
syncRepState = 0, syncRepLinks = {prev = 0x0,
next = 0x0}, myProcLocks = {{prev = 0x7f70e9e2d2f0, next =
0x7f70e9e2d2f0}, {prev = 0x7f70e9e2d300, next = 0x7f70e9e2d300}, {prev =
0x7f70e9e2d310, next = 0x7f70e9e2d310}, {
prev = 0x7f70e9e2d320, next = 0x7f70e9e2d320}, {prev =
0x7f70e9e2d330, next = 0x7f70e9e2d330}, {prev = 0x7f70e9e2d340, next =
0x7f70e9e2d340}, {prev = 0x7f70e9e2d350,
next = 0x7f70e9e2d350}, {prev = 0x7f70e9e2d360, next =
0x7f70e9e2d360}, {prev = 0x7f70e9e2d370, next = 0x7f70e9e2d370}, {prev =
0x7f70e9e2d380, next = 0x7f70e9e2d380}, {
prev = 0x7f70e9e2d390, next = 0x7f70e9e2d390}, {prev =
0x7f70e9e2d3a0, next = 0x7f70e9e2d3a0}, {prev = 0x7f70e9e2d3b0, next =
0x7f70e9e2d3b0}, {prev = 0x7f70e9e2d3c0,
next = 0x7f70e9e2d3c0}, {prev = 0x7f70e9e2d3d0, next =
0x7f70e9e2d3d0}, {prev = 0x7f70e9e2d3e0, next = 0x7f70e9e2d3e0}}, subxids =
{xids = {0 <repeats 64 times>}},
backendLock = 0x7f70e13dad00, fpLockBits = 0, fpRelId = {0 <repeats 16
times>}, fpVXIDLock = 0 '\000', fpLocalTransactionId = 0}
#0 0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1 0x0000000000671aef in PGSemaphoreLock (sema=0x7f70e9e2a108) at
pg_sema.c:387
#2 0x00000000006d79ac in LWLockWaitForVar (lock=0x7f70e13df000,
valptr=0x7f70e13df028, oldval=0, newval=0x7fffb33cbd48) at lwlock.c:1406
#3 0x00000000004ee006 in WaitXLogInsertionsToFinish (upto=41590734848) at
xlog.c:1576
#4 0x00000000004ee94b in AdvanceXLInsertBuffer (upto=41594920960,
opportunistic=<value optimized out>) at xlog.c:1888
#5 0x00000000004f3c42 in GetXLogBuffer (ptr=41594920960) at xlog.c:1669
#6 0x00000000004f40e7 in CopyXLogRecordToWAL (rdata=0x19bc920,
fpw_lsn=<value optimized out>) at xlog.c:1313
#7 XLogInsertRecord (rdata=0x19bc920, fpw_lsn=<value optimized out>) at
xlog.c:1008
#8 0x00000000004f7c6c in XLogInsert (rmid=13 '\r', info=32 ' ') at
xloginsert.c:453
#9 0x000000000047e210 in ginPlaceToPage (btree=0x7fffb33cc070,
stack=0x1a5bfe0, insertdata=<value optimized out>, updateblkno=<value
optimized out>, childbuf=0, buildStats=0x0)
at ginbtree.c:418
#10 0x000000000047f50d in ginInsertValue (btree=0x7fffb33cc070,
stack=0x1a5bfe0, insertdata=0x7fffb33cc0f0, buildStats=0x0) at
ginbtree.c:748
#11 0x0000000000475deb in ginEntryInsert (ginstate=0x7fffb33cc470,
attnum=5544, key=1, category=<value optimized out>, items=0x7f70e10d9140,
nitem=51, buildStats=0x0)
at gininsert.c:234
#12 0x000000000048602c in ginInsertCleanup (ginstate=0x7fffb33cc470,
vac_delay=<value optimized out>, stats=0x0) at ginfast.c:843
#13 0x00000000004871b9 in ginHeapTupleFastInsert (ginstate=0x7fffb33cc470,
collector=<value optimized out>) at ginfast.c:436
#14 0x000000000047625a in gininsert (fcinfo=<value optimized out>) at
gininsert.c:531
#15 0x00000000007dc483 in FunctionCall6Coll (flinfo=<value optimized out>,
collation=<value optimized out>, arg1=<value optimized out>, arg2=<value
optimized out>,
arg3=<value optimized out>, arg4=<value optimized out>,
arg5=140122789534360, arg6=0) at fmgr.c:1436
#16 0x00000000004b2247 in index_insert (indexRelation=0x7f70e1190e60,
values=0x7fffb33cef50, isnull=0x7fffb33cf050 "", heap_t_ctid=0x1a4a794,
heapRelation=0x7f70e1185a98,
checkUnique=UNIQUE_CHECK_NO) at indexam.c:226
#17 0x00000000005d2e67 in ExecInsertIndexTuples (slot=0x1a497f0,
tupleid=0x1a4a794, estate=0x1a40768, noDupErr=0 '\000', specConflict=0x0,
arbiterIndexes=0x0)
at execIndexing.c:384
#18 0x00000000005f0161 in ExecUpdate (tupleid=0x7fffb33cf250, oldtuple=0x0,
slot=0x1a497f0, planSlot=0x1a42498, epqstate=0x1a40a70, estate=0x1a40768,
canSetTag=1 '\001')
at nodeModifyTable.c:978
#19 0x00000000005f0b2a in ExecModifyTable (node=0x1a409d0) at
nodeModifyTable.c:1436
#20 0x00000000005d6cc8 in ExecProcNode (node=0x1a409d0) at
execProcnode.c:389
#21 0x00000000005d5402 in ExecutePlan (queryDesc=0x1a2b220,
direction=<value optimized out>, count=0) at execMain.c:1549
#22 standard_ExecutorRun (queryDesc=0x1a2b220, direction=<value optimized
out>, count=0) at execMain.c:337
#23 0x00007f70ea39af3b in pgss_ExecutorRun (queryDesc=0x1a2b220,
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:881
#24 0x00000000006ea38f in ProcessQuery (plan=0x1a2b3d8,
sourceText=0x1a2af68 "update foo set count=count+1 where text_array @> $1",
params=0x1a2afe8, dest=<value optimized out>,
completionTag=0x7fffb33cf660 "") at pquery.c:185
#25 0x00000000006ea5ec in PortalRunMulti (portal=0x19a6128, isTopLevel=1
'\001', dest=0xc55020, altdest=0xc55020, completionTag=0x7fffb33cf660 "")
at pquery.c:1279
#26 0x00000000006eadb3 in PortalRun (portal=0x19a6128,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x1990630,
altdest=0x1990630, completionTag=0x7fffb33cf660 "")
at pquery.c:816
#27 0x00000000006e6ffb in exec_execute_message (portal_name=0x1990218 "",
max_rows=9223372036854775807) at postgres.c:1988
#28 0x00000000006e8c15 in PostgresMain (argc=<value optimized out>,
argv=<value optimized out>, dbname=0x19a3210 "jjanes", username=<value
optimized out>) at postgres.c:4088
#29 0x00000000006855dd in BackendRun (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:4159
#30 BackendStartup (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:3835
#31 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at
postmaster.c:1609
#32 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1254
#33 0x0000000000610ab0 in main (argc=4, argv=0x1976cf0) at main.c:221
24820:
(gdb) p held_lwlocks
$1 = {{lock = 0x7f70e11d4340, mode = LW_EXCLUSIVE}, {lock = 0x7f70e13ded00,
mode = LW_EXCLUSIVE}, {lock = 0x7f70e13ded80, mode = LW_EXCLUSIVE}, {lock =
0x7f70e13dee00,
mode = LW_EXCLUSIVE}, {lock = 0x7f70e13dee80, mode = LW_EXCLUSIVE},
{lock = 0x7f70e13def00, mode = LW_EXCLUSIVE}, {lock = 0x7f70e13def80, mode
= LW_EXCLUSIVE}, {
lock = 0x7f70e13df000, mode = LW_EXCLUSIVE}, {lock = 0x0, mode =
LW_EXCLUSIVE} <repeats 192 times>}
(gdb) p lock
$3 = (LWLock *) 0x7f70e13df080
(gdb) p *lock
$4 = {mutex = 0 '\000', tranche = 1, state = {value = 1627389952}, nwaiters
= {value = 1}, waiters = {head = {prev = 0x7f70e9e2d2a8, next =
0x7f70e9e2d2a8}},
owner = 0x7f70e9e2a0f8}
(gdb) p *(lock->owner)
$5 = {links = {prev = 0x0, next = 0x0}, sem = {semId = 539688999, semNum =
15}, waitStatus = 0, procLatch = {is_set = 1, is_shared = 1 '\001',
owner_pid = 24841}, lxid = 5523,
pid = 24841, pgprocno = 95, backendId = 9, databaseId = 16384, roleId =
10, recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode =
2 '\002', lwWaitLink = {
prev = 0x7f70e13df010, next = 0x7f70e9e29e58}, waitLock = 0x0,
waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0,
syncRepState = 0, syncRepLinks = {prev = 0x0,
next = 0x0}, myProcLocks = {{prev = 0x7f70e9e2a188, next =
0x7f70e9e2a188}, {prev = 0x7f70e9e2a198, next = 0x7f70e9e2a198}, {prev =
0x7f70e9e2a1a8, next = 0x7f70e9e2a1a8}, {
prev = 0x7f70e9bfe060, next = 0x7f70e9bfe060}, {prev =
0x7f70e9e2a1c8, next = 0x7f70e9e2a1c8}, {prev = 0x7f70e9e2a1d8, next =
0x7f70e9e2a1d8}, {prev = 0x7f70e9e2a1e8,
next = 0x7f70e9e2a1e8}, {prev = 0x7f70e9e2a1f8, next =
0x7f70e9e2a1f8}, {prev = 0x7f70e9e2a208, next = 0x7f70e9e2a208}, {prev =
0x7f70e9e2a218, next = 0x7f70e9e2a218}, {
prev = 0x7f70e9e2a228, next = 0x7f70e9e2a228}, {prev =
0x7f70e9e2a238, next = 0x7f70e9e2a238}, {prev = 0x7f70e9e2a248, next =
0x7f70e9e2a248}, {prev = 0x7f70e9e2a258,
next = 0x7f70e9e2a258}, {prev = 0x7f70e9e2a268, next =
0x7f70e9e2a268}, {prev = 0x7f70e9e2a278, next = 0x7f70e9e2a278}}, subxids =
{xids = {0 <repeats 64 times>}},
backendLock = 0x7f70e13da8c0, fpLockBits = 160528697655296, fpRelId = {0,
0, 0, 0, 0, 0, 0, 0, 0, 2679, 2610, 16414, 16412, 16416, 16415, 16409},
fpVXIDLock = 1 '\001',
fpLocalTransactionId = 5523}
(gdb) bt
#0 0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1 0x0000000000671aef in PGSemaphoreLock (sema=0x7f70e9e2d270) at
pg_sema.c:387
#2 0x00000000006d70ac in LWLockAcquireCommon (lock=0x7f70e13df080,
mode=LW_EXCLUSIVE, valptr=0x7f70e13df0a8, val=0) at lwlock.c:1042
#3 0x00000000004e6dab in WALInsertLockAcquireExclusive () at xlog.c:1449
#4 0x00000000004ef025 in CreateCheckPoint (flags=64) at xlog.c:8177
#5 0x00000000006789d3 in CheckpointerMain () at checkpointer.c:493
#6 0x00000000004fecec in AuxiliaryProcessMain (argc=2,
argv=0x7fffb33cf1d0) at bootstrap.c:428
#7 0x0000000000680697 in StartChildProcess (type=CheckpointerProcess) at
postmaster.c:5143
#8 0x0000000000682b7a in reaper (postgres_signal_arg=<value optimized
out>) at postmaster.c:2656
#9 <signal handler called>
#10 0x0000003dcb6e1353 in __select_nocancel () from /lib64/libc.so.6
#11 0x00000000006849bf in ServerLoop (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:1573
#12 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1254
#13 0x0000000000610ab0 in main (argc=4, argv=0x1976cf0) at main.c:221
Show quoted text
Thanks,
Jeff
Jeff Janes wrote:
I've gotten the LWLock deadlock again. User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)
Hmm, interesting -- pg_stat_statement is trying to re-do an operation
that involves updating a GIN index, while WAL-logging for the original
update is still ongoing, it seems.
#0 0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1 0x0000000000671aef in PGSemaphoreLock (sema=0x7f70e9e2a108) at
pg_sema.c:387
#2 0x00000000006d79ac in LWLockWaitForVar (lock=0x7f70e13df000,
valptr=0x7f70e13df028, oldval=0, newval=0x7fffb33cbd48) at lwlock.c:1406
#3 0x00000000004ee006 in WaitXLogInsertionsToFinish (upto=41590734848) at
xlog.c:1576
#4 0x00000000004ee94b in AdvanceXLInsertBuffer (upto=41594920960,
opportunistic=<value optimized out>) at xlog.c:1888
#5 0x00000000004f3c42 in GetXLogBuffer (ptr=41594920960) at xlog.c:1669
#6 0x00000000004f40e7 in CopyXLogRecordToWAL (rdata=0x19bc920,
fpw_lsn=<value optimized out>) at xlog.c:1313
#7 XLogInsertRecord (rdata=0x19bc920, fpw_lsn=<value optimized out>) at
xlog.c:1008
#8 0x00000000004f7c6c in XLogInsert (rmid=13 '\r', info=32 ' ') at
xloginsert.c:453
#9 0x000000000047e210 in ginPlaceToPage (btree=0x7fffb33cc070,
stack=0x1a5bfe0, insertdata=<value optimized out>, updateblkno=<value
optimized out>, childbuf=0, buildStats=0x0)
at ginbtree.c:418
#10 0x000000000047f50d in ginInsertValue (btree=0x7fffb33cc070,
stack=0x1a5bfe0, insertdata=0x7fffb33cc0f0, buildStats=0x0) at
ginbtree.c:748
#11 0x0000000000475deb in ginEntryInsert (ginstate=0x7fffb33cc470,
attnum=5544, key=1, category=<value optimized out>, items=0x7f70e10d9140,
nitem=51, buildStats=0x0)
at gininsert.c:234
#12 0x000000000048602c in ginInsertCleanup (ginstate=0x7fffb33cc470,
vac_delay=<value optimized out>, stats=0x0) at ginfast.c:843
#13 0x00000000004871b9 in ginHeapTupleFastInsert (ginstate=0x7fffb33cc470,
collector=<value optimized out>) at ginfast.c:436
#14 0x000000000047625a in gininsert (fcinfo=<value optimized out>) at
gininsert.c:531
#15 0x00000000007dc483 in FunctionCall6Coll (flinfo=<value optimized out>,
collation=<value optimized out>, arg1=<value optimized out>, arg2=<value
optimized out>,
arg3=<value optimized out>, arg4=<value optimized out>,
arg5=140122789534360, arg6=0) at fmgr.c:1436
#16 0x00000000004b2247 in index_insert (indexRelation=0x7f70e1190e60,
values=0x7fffb33cef50, isnull=0x7fffb33cf050 "", heap_t_ctid=0x1a4a794,
heapRelation=0x7f70e1185a98,
checkUnique=UNIQUE_CHECK_NO) at indexam.c:226
#17 0x00000000005d2e67 in ExecInsertIndexTuples (slot=0x1a497f0,
tupleid=0x1a4a794, estate=0x1a40768, noDupErr=0 '\000', specConflict=0x0,
arbiterIndexes=0x0)
at execIndexing.c:384
#18 0x00000000005f0161 in ExecUpdate (tupleid=0x7fffb33cf250, oldtuple=0x0,
slot=0x1a497f0, planSlot=0x1a42498, epqstate=0x1a40a70, estate=0x1a40768,
canSetTag=1 '\001')
at nodeModifyTable.c:978
#19 0x00000000005f0b2a in ExecModifyTable (node=0x1a409d0) at
nodeModifyTable.c:1436
#20 0x00000000005d6cc8 in ExecProcNode (node=0x1a409d0) at
execProcnode.c:389
#21 0x00000000005d5402 in ExecutePlan (queryDesc=0x1a2b220,
direction=<value optimized out>, count=0) at execMain.c:1549
#22 standard_ExecutorRun (queryDesc=0x1a2b220, direction=<value optimized
out>, count=0) at execMain.c:337
#23 0x00007f70ea39af3b in pgss_ExecutorRun (queryDesc=0x1a2b220,
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:881
#24 0x00000000006ea38f in ProcessQuery (plan=0x1a2b3d8,
sourceText=0x1a2af68 "update foo set count=count+1 where text_array @> $1",
params=0x1a2afe8, dest=<value optimized out>,
completionTag=0x7fffb33cf660 "") at pquery.c:185
#25 0x00000000006ea5ec in PortalRunMulti (portal=0x19a6128, isTopLevel=1
'\001', dest=0xc55020, altdest=0xc55020, completionTag=0x7fffb33cf660 "")
at pquery.c:1279
#26 0x00000000006eadb3 in PortalRun (portal=0x19a6128,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x1990630,
altdest=0x1990630, completionTag=0x7fffb33cf660 "")
at pquery.c:816
#27 0x00000000006e6ffb in exec_execute_message (portal_name=0x1990218 "",
max_rows=9223372036854775807) at postgres.c:1988
#28 0x00000000006e8c15 in PostgresMain (argc=<value optimized out>,
argv=<value optimized out>, dbname=0x19a3210 "jjanes", username=<value
optimized out>) at postgres.c:4088
#29 0x00000000006855dd in BackendRun (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:4159
#30 BackendStartup (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:3835
#31 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at
postmaster.c:1609
#32 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1254
#33 0x0000000000610ab0 in main (argc=4, argv=0x1976cf0) at main.c:221
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/30/2015 07:05 PM, Jeff Janes wrote:
On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Is there a way to use gdb to figure out who holds the lock they are
waiting
for?
Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.I hadn't planned on running into this problem, so had not compiled
accordingly.I thought LOCK_DEBUG would produce too much output, but now I see it
doesn't print anything unless Trace_lwlocks is on (but it still makes more
info available to gdb, as Amit mentioned), so I think that should be OK.Since I messed up the gdb session causing the postmaster to SIGKILL all
the children and destroy the evidence, I'll repeat the run compiled with
LOCK_DEBUG and see what it looks like in the morning.I've gotten the LWLock deadlock again. User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)The Checkpoint process, meanwhile, has acquired WALInsertLocks 0 through 6
and is blocked on 7.I'm not sure where to go from here.
The user backend 24841 is waiting in a LWLockWaitForVar() call, on
WALInsertLock 6, and oldval==0. The checkpointer is holding
WALInsertLock 6, but it has set the insertingat value to PG_UINT64_MAX.
The LWLockWaitForVar() call should not be blocked on that, because 0 !=
PG_UINT64_MAX.
Looks like the LWLock-scalability patch that made LWlock acquisiton to
use atomic ops instead of a spinlock broke this. LWLockWaitForVar() is
supposed to guarantee that it always wakes up from sleep, when the
variable's value changes. But there is now a race condition in
LWLockWaitForVar that wasn't there in 9.4:
if (mustwait)
{
/*
* Perform comparison using spinlock as we can't rely on atomic 64
* bit reads/stores.
*/
#ifdef LWLOCK_STATS
lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
#else
SpinLockAcquire(&lock->mutex);
#endif/*
* XXX: We can significantly optimize this on platforms with 64bit
* atomics.
*/
value = *valptr;
if (value != oldval)
{
result = false;
mustwait = false;
*newval = value;
}
else
mustwait = true;
SpinLockRelease(&lock->mutex);
}
else
mustwait = false;if (!mustwait)
break; /* the lock was free or value didn't match *//*
* Add myself to wait queue. Note that this is racy, somebody else
* could wakeup before we're finished queuing. NB: We're using nearly
* the same twice-in-a-row lock acquisition protocol as
* LWLockAcquire(). Check its comments for details.
*/
LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false);
After the spinlock is released above, but before the LWLockQueueSelf()
call, it's possible that another backend comes in, acquires the lock,
changes the variable's value, and releases the lock again. In 9.4, the
spinlock was not released until the process was queued.
Looking at LWLockAcquireWithVar(), it's also no longer holding the
spinlock while it updates the Var. That's not cool either :-(.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/30/2015 07:37 PM, Alvaro Herrera wrote:
Jeff Janes wrote:
I've gotten the LWLock deadlock again. User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)Hmm, interesting -- pg_stat_statement is trying to re-do an operation
that involves updating a GIN index, while WAL-logging for the original
update is still ongoing, it seems.
I don't think pg_stat_statement has anything to do with this. You can
see from the backtrace that pg_stat_statement is enabled, as the call
went through the pgss_ExecutorRun executor hook, but that's all.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote:
/*
* XXX: We can significantly optimize this on platforms with 64bit
* atomics.
*/
value = *valptr;
if (value != oldval)
{
result = false;
mustwait = false;
*newval = value;
}
else
mustwait = true;
SpinLockRelease(&lock->mutex);
}
else
mustwait = false;if (!mustwait)
break; /* the lock was free or value didn't match *//*
* Add myself to wait queue. Note that this is racy, somebody else
* could wakeup before we're finished queuing. NB: We're using nearly
* the same twice-in-a-row lock acquisition protocol as
* LWLockAcquire(). Check its comments for details.
*/
LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false);After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.
Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.
Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock
while it updates the Var. That's not cool either :-(.
Hm. I'd somehow assumed holding the lwlock ought to be sufficient, but
it really isn't. This var stuff isn't fitting all that well into the
lwlock code.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/30/2015 10:09 PM, Andres Freund wrote:
On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote:
/*
* XXX: We can significantly optimize this on platforms with 64bit
* atomics.
*/
value = *valptr;
if (value != oldval)
{
result = false;
mustwait = false;
*newval = value;
}
else
mustwait = true;
SpinLockRelease(&lock->mutex);
}
else
mustwait = false;if (!mustwait)
break; /* the lock was free or value didn't match *//*
* Add myself to wait queue. Note that this is racy, somebody else
* could wakeup before we're finished queuing. NB: We're using nearly
* the same twice-in-a-row lock acquisition protocol as
* LWLockAcquire(). Check its comments for details.
*/
LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false);After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.
Yeah. It's probably more efficient to not release the spinlock between
checking the value and LWLockQueueSelf() though.
Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock
while it updates the Var. That's not cool either :-(.Hm. I'd somehow assumed holding the lwlock ought to be sufficient, but
it really isn't. This var stuff isn't fitting all that well into the
lwlock code.
I'll take a stab at fixing this tomorrow. I take the blame for not
documenting the semantics of LWLockAcquireWithVar() and friends well
enough...
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:
Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.Yeah. It's probably more efficient to not release the spinlock between
checking the value and LWLockQueueSelf() though.
Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
way in a bunch of callsites... So that'd be harder. Additionally I'm
planning to get rid of the spinlocks around queuing (they show up as
bottlenecks in contended workloads), so it seems more future proof not
to assume that either way. On top of that I think we should, when
available (or using the same type of fallback as for 32bit?), use 64 bit
atomics for the var anyway.
I'll take a stab at fixing this tomorrow.
Thanks! Do you mean both or "just" the second issue?
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/30/2015 11:24 PM, Andres Freund wrote:
On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:
Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.Yeah. It's probably more efficient to not release the spinlock between
checking the value and LWLockQueueSelf() though.Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
way in a bunch of callsites... So that'd be harder. Additionally I'm
planning to get rid of the spinlocks around queuing (they show up as
bottlenecks in contended workloads), so it seems more future proof not
to assume that either way. On top of that I think we should, when
available (or using the same type of fallback as for 32bit?), use 64 bit
atomics for the var anyway.I'll take a stab at fixing this tomorrow.
Thanks! Do you mean both or "just" the second issue?
Both. Here's the patch.
Previously, LWLockAcquireWithVar set the variable associated with the
lock atomically with acquiring it. Before the lwlock-scalability
changes, that was straightforward because you held the spinlock anyway,
but it's a lot harder/expensive now. So I changed the way acquiring a
lock with a variable works. There is now a separate flag,
LW_FLAG_VAR_SET, which indicates that the current lock holder has
updated the variable. The LWLockAcquireWithVar function is gone - you
now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET
flag, and you can call LWLockUpdateVar() after that if you want to set
the variable immediately. LWLockWaitForVar() always waits if the flag is
not set, i.e. it will not return regardless of the variable's value, if
the current lock-holder has not updated it yet.
This passes make check, but I haven't done any testing beyond that. Does
this look sane to you?
- Heikki
Attachments:
0001-Fix-LWLock-variable-support-broken-by-the-lwlock-sca.patchapplication/x-patch; name=0001-Fix-LWLock-variable-support-broken-by-the-lwlock-sca.patchDownload
From ce90bd9a1e2c8b5783ebeea83594da7d3a1c63de Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 15 Jul 2015 18:33:28 +0300
Subject: [PATCH 1/1] Fix LWLock "variable" support, broken by the lwlock
scalability patch
---
src/backend/access/transam/xlog.c | 15 ++---
src/backend/storage/lmgr/lwlock.c | 135 ++++++++++++++++++++++----------------
src/include/storage/lwlock.h | 1 -
3 files changed, 83 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1dd31b3..2f34346 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1408,9 +1408,7 @@ WALInsertLockAcquire(void)
* The insertingAt value is initially set to 0, as we don't know our
* insert location yet.
*/
- immed = LWLockAcquireWithVar(&WALInsertLocks[MyLockNo].l.lock,
- &WALInsertLocks[MyLockNo].l.insertingAt,
- 0);
+ immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
if (!immed)
{
/*
@@ -1442,13 +1440,12 @@ WALInsertLockAcquireExclusive(void)
*/
for (i = 0; i < NUM_XLOGINSERT_LOCKS - 1; i++)
{
- LWLockAcquireWithVar(&WALInsertLocks[i].l.lock,
- &WALInsertLocks[i].l.insertingAt,
- PG_UINT64_MAX);
+ LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
+ LWLockUpdateVar(&WALInsertLocks[i].l.lock,
+ &WALInsertLocks[i].l.insertingAt,
+ PG_UINT64_MAX);
}
- LWLockAcquireWithVar(&WALInsertLocks[i].l.lock,
- &WALInsertLocks[i].l.insertingAt,
- 0);
+ LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
holdingAllLocks = true;
}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 46cab49..92a1aef 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -11,12 +11,12 @@
* LWLocks to protect its shared state.
*
* In addition to exclusive and shared modes, lightweight locks can be used
- * to wait until a variable changes value. The variable is initially set
- * when the lock is acquired with LWLockAcquireWithVar, and can be updated
- * without releasing the lock by calling LWLockUpdateVar. LWLockWaitForVar
- * waits for the variable to be updated, or until the lock is free. The
- * meaning of the variable is up to the caller, the lightweight lock code
- * just assigns and compares it.
+ * to wait until a variable changes value. The variable is initially put
+ * in a "not-set" state when the lock is acquired with LWLockAcquire, and
+ * can be updated without releasing the lock by calling LWLockUpdateVar.
+ * LWLockWaitForVar waits for the variable to be updated, or until the lock
+ * is free. The meaning of the variable is up to the caller, the lightweight
+ * lock code just assigns and compares it.
*
* Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -99,6 +99,7 @@ extern slock_t *ShmemLock;
#define LW_FLAG_HAS_WAITERS ((uint32) 1 << 30)
#define LW_FLAG_RELEASE_OK ((uint32) 1 << 29)
+#define LW_FLAG_VAR_SET ((uint32) 1 << 28)
#define LW_VAL_EXCLUSIVE ((uint32) 1 << 24)
#define LW_VAL_SHARED 1
@@ -150,9 +151,6 @@ static LWLockHandle held_lwlocks[MAX_SIMUL_LWLOCKS];
static int lock_addin_request = 0;
static bool lock_addin_request_allowed = true;
-static inline bool LWLockAcquireCommon(LWLock *l, LWLockMode mode,
- uint64 *valptr, uint64 val);
-
#ifdef LWLOCK_STATS
typedef struct lwlock_stats_key
{
@@ -601,7 +599,14 @@ LWLockAttemptLock(LWLock *lock, LWLockMode mode)
{
lock_free = (expected_state & LW_LOCK_MASK) == 0;
if (lock_free)
- desired_state += LW_VAL_EXCLUSIVE;
+ {
+ /*
+ * Make the lock exclusively-locked for us, and make the
+ * associated variable, if any, un-set.
+ */
+ desired_state =
+ (desired_state + LW_VAL_EXCLUSIVE) & ~LW_FLAG_VAR_SET;
+ }
}
else
{
@@ -745,7 +750,7 @@ LWLockWakeup(LWLock *lock)
* NB: Mode can be LW_WAIT_UNTIL_FREE here!
*/
static void
-LWLockQueueSelf(LWLock *lock, LWLockMode mode)
+LWLockQueueSelf(LWLock *lock, LWLockMode mode, bool spinlock_held)
{
#ifdef LWLOCK_STATS
lwlock_stats *lwstats;
@@ -764,11 +769,14 @@ LWLockQueueSelf(LWLock *lock, LWLockMode mode)
if (MyProc->lwWaiting)
elog(PANIC, "queueing for lock while waiting on another one");
+ if (!spinlock_held)
+ {
#ifdef LWLOCK_STATS
- lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
+ lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
#else
- SpinLockAcquire(&lock->mutex);
+ SpinLockAcquire(&lock->mutex);
#endif
+ }
/* setting the flag is protected by the spinlock */
pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_HAS_WAITERS);
@@ -783,7 +791,8 @@ LWLockQueueSelf(LWLock *lock, LWLockMode mode)
dlist_push_tail(&lock->waiters, &MyProc->lwWaitLink);
/* Can release the mutex now */
- SpinLockRelease(&lock->mutex);
+ if (!spinlock_held)
+ SpinLockRelease(&lock->mutex);
#ifdef LOCK_DEBUG
pg_atomic_fetch_add_u32(&lock->nwaiters, 1);
@@ -900,25 +909,7 @@ LWLockDequeueSelf(LWLock *lock)
* Side effect: cancel/die interrupts are held off until lock release.
*/
bool
-LWLockAcquire(LWLock *l, LWLockMode mode)
-{
- return LWLockAcquireCommon(l, mode, NULL, 0);
-}
-
-/*
- * LWLockAcquireWithVar - like LWLockAcquire, but also sets *valptr = val
- *
- * The lock is always acquired in exclusive mode with this function.
- */
-bool
-LWLockAcquireWithVar(LWLock *l, uint64 *valptr, uint64 val)
-{
- return LWLockAcquireCommon(l, LW_EXCLUSIVE, valptr, val);
-}
-
-/* internal function to implement LWLockAcquire and LWLockAcquireWithVar */
-static inline bool
-LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
+LWLockAcquire(LWLock *lock, LWLockMode mode)
{
PGPROC *proc = MyProc;
bool result = true;
@@ -1003,7 +994,7 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
*/
/* add to the queue */
- LWLockQueueSelf(lock, mode);
+ LWLockQueueSelf(lock, mode, false);
/* we're now guaranteed to be woken up if necessary */
mustwait = LWLockAttemptLock(lock, mode);
@@ -1065,10 +1056,6 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
result = false;
}
- /* If there's a variable associated with this lock, initialize it */
- if (valptr)
- *valptr = val;
-
TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), T_ID(lock), mode);
/* Add lock to list of locks held by this backend */
@@ -1181,7 +1168,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
if (mustwait)
{
- LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE);
+ LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false);
mustwait = LWLockAttemptLock(lock, mode);
@@ -1315,24 +1302,53 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
{
bool mustwait;
uint64 value;
+ uint32 state;
mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
- if (mustwait)
- {
- /*
- * Perform comparison using spinlock as we can't rely on atomic 64
- * bit reads/stores.
- */
+ if (!mustwait)
+ break; /* the lock was free */
+
+ /*
+ * Check if the variable is set and whether its current value matches
+ * the old value the caller provided. We have to hold the spinlock
+ * while we check that, until we have queued ourselves to the lock,
+ * so that the lock-holder cannot change the value without waking us
+ * up.
+ */
#ifdef LWLOCK_STATS
- lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
+ lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
#else
- SpinLockAcquire(&lock->mutex);
+ SpinLockAcquire(&lock->mutex);
#endif
+ /*
+ * XXX: We can significantly optimize this on platforms with 64bit
+ * atomics.
+ */
+ state = pg_atomic_read_u32(&lock->state);
+ if ((state & LW_VAL_EXCLUSIVE) == 0)
+ {
/*
- * XXX: We can significantly optimize this on platforms with 64bit
- * atomics.
+ * The lock is free (or at least not exclusively-locked). We can
+ * return to caller immediately.
+ */
+ result = true;
+ mustwait = false;
+ }
+ else if ((state & LW_FLAG_VAR_SET) == 0)
+ {
+ /*
+ * The lock is held, but the locker has not set the variable
+ * yet. We have to wait.
+ */
+ mustwait = true;
+ }
+ else
+ {
+ /*
+ * The lock is held, and the locker has set the variable. Compare
+ * the variable's current value.
*/
value = *valptr;
if (value != oldval)
@@ -1341,23 +1357,21 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
mustwait = false;
*newval = value;
}
- else
- mustwait = true;
- SpinLockRelease(&lock->mutex);
}
- else
- mustwait = false;
if (!mustwait)
- break; /* the lock was free or value didn't match */
+ {
+ SpinLockRelease(&lock->mutex);
+ break; /* the lock was free or the value didn't match */
+ }
- /*
+ /* XXX: is this racy still?
* Add myself to wait queue. Note that this is racy, somebody else
* could wakeup before we're finished queuing. NB: We're using nearly
* the same twice-in-a-row lock acquisition protocol as
* LWLockAcquire(). Check its comments for details.
*/
- LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE);
+ LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, true);
/*
* Set RELEASE_OK flag, to make sure we get woken up as soon as the
@@ -1367,10 +1381,13 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
/*
* We're now guaranteed to be woken up if necessary. Recheck the
- * lock's state.
+ * lock's state. No-one could change the protected variable's value
+ * while we're holding the spinlock, but the lock may have been freed.
*/
mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
+ SpinLockRelease(&lock->mutex);
+
/* Ok, lock is free after we queued ourselves. Undo queueing. */
if (!mustwait)
{
@@ -1495,6 +1512,8 @@ LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
dlist_push_tail(&wakeup, &waiter->lwWaitLink);
}
+ pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_VAR_SET);
+
/* We are done updating shared state of the lock itself. */
SpinLockRelease(&lock->mutex);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index cff3b99..a0ee608 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -185,7 +185,6 @@ extern void LWLockRelease(LWLock *lock);
extern void LWLockReleaseAll(void);
extern bool LWLockHeldByMe(LWLock *lock);
-extern bool LWLockAcquireWithVar(LWLock *lock, uint64 *valptr, uint64 val);
extern bool LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval);
extern void LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 value);
--
2.1.4
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Both. Here's the patch.
Previously, LWLockAcquireWithVar set the variable associated with the lock
atomically with acquiring it. Before the lwlock-scalability changes, that
was straightforward because you held the spinlock anyway, but it's a lot
harder/expensive now. So I changed the way acquiring a lock with a variable
works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
the current lock holder has updated the variable. The LWLockAcquireWithVar
function is gone - you now just use LWLockAcquire(), which always clears
the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
you want to set the variable immediately. LWLockWaitForVar() always waits
if the flag is not set, i.e. it will not return regardless of the
variable's value, if the current lock-holder has not updated it yet.
I ran this for a while without casserts and it seems to work. But with
casserts, I get failures in the autovac process on the GIN index.
I don't see how this is related to the LWLock issue, but I didn't see it
without your patch. Perhaps the system just didn't survive long enough to
uncover it without the patch (although it shows up pretty quickly). It
could just be an overzealous Assert, since the casserts off didn't show
problems.
bt and bt full are shown below.
Cheers,
Jeff
#0 0x0000003dcb632625 in raise () from /lib64/libc.so.6
#1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6
#2 0x0000000000930b7a in ExceptionalCondition (
conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >=
(__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc
"FailedAssertion",
fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54
#3 0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
ginvacuum.c:713
#4 0x000000000093b53a in FunctionCall2Coll (flinfo=0x7fffee073e60,
collation=0, arg1=140737186840448, arg2=0) at fmgr.c:1323
#5 0x00000000004d5c7a in index_vacuum_cleanup (info=0x7fffee073f80,
stats=0x0) at indexam.c:717
#6 0x0000000000664f69 in lazy_cleanup_index (indrel=0x7faafbcace20,
stats=0x0, vacrelstats=0x28809c8) at vacuumlazy.c:1400
#7 0x0000000000664142 in lazy_scan_heap (onerel=0x7faafbcab6d0,
vacrelstats=0x28809c8, Irel=0x2881090, nindexes=2, scan_all=1 '\001') at
vacuumlazy.c:1111
#8 0x0000000000662905 in lazy_vacuum_rel (onerel=0x7faafbcab6d0,
options=65, params=0x286ea00, bstrategy=0x286ea90) at vacuumlazy.c:247
#9 0x00000000006623e4 in vacuum_rel (relid=16409, relation=0x7fffee074550,
options=65, params=0x286ea00) at vacuum.c:1377
#10 0x0000000000660bea in vacuum (options=65, relation=0x7fffee074550,
relid=16409, params=0x286ea00, va_cols=0x0, bstrategy=0x286ea90,
isTopLevel=1 '\001')
at vacuum.c:295
#11 0x000000000075caa9 in autovacuum_do_vac_analyze (tab=0x286e9f8,
bstrategy=0x286ea90) at autovacuum.c:2811
#12 0x000000000075be67 in do_autovacuum () at autovacuum.c:2331
#13 0x000000000075ac1c in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1648
#14 0x000000000075a84d in StartAutoVacWorker () at autovacuum.c:1455
#15 0x000000000076f745 in StartAutovacuumWorker () at postmaster.c:5261
#16 0x000000000076effd in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:4918
#17 <signal handler called>
#18 0x0000003dcb6e1353 in __select_nocancel () from /lib64/libc.so.6
#19 0x000000000076a8f0 in ServerLoop () at postmaster.c:1582
#20 0x000000000076a141 in PostmasterMain (argc=4, argv=0x27b2cf0) at
postmaster.c:1263
#21 0x00000000006c1e6e in main (argc=4, argv=0x27b2cf0) at main.c:223
#0 0x0000003dcb632625 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x0000000000930b7a in ExceptionalCondition (
conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >=
(__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc
"FailedAssertion",
fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54
No locals.
#3 0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
ginvacuum.c:713
buffer = 186
page = 0x7faafc565b20 ""
info = 0x7fffee073f80
stats = 0x2880858
index = 0x7faafbcace20
needLock = 1 '\001'
npages = 22569
blkno = 12025
totFreePages = 11502
ginstate = {index = 0x7faafbcace20, oneCol = 1 '\001', origTupdesc
= 0x7faafbcad150, tupdesc = {0x7faafbcad150, 0x0 <repeats 31 times>},
compareFn = {{fn_addr = 0x90224d <bttextcmp>, fn_oid = 360,
fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2
'\002',
fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} <repeats 31 times>}, extractValueFn = {{
fn_addr = 0x494adc <ginarrayextract>, fn_oid = 2743, fn_nargs
= 3, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2 '\002',
fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} <repeats 31 times>}, extractQueryFn = {{
fn_addr = 0x494beb <ginqueryarrayextract>, fn_oid = 2774,
fn_nargs = 7, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2
'\002',
fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} <repeats 31 times>}, consistentFn = {{
fn_addr = 0x494d67 <ginarrayconsistent>, fn_oid = 2744,
fn_nargs = 8, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2
'\002',
fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} <repeats 31 times>}, triConsistentFn = {{
fn_addr = 0x494efd <ginarraytriconsistent>, fn_oid = 3920,
fn_nargs = 7, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2
'\002',
fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} <repeats 31 times>}, comparePartialFn = {{fn_addr = 0, fn_oid = 0,
fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0,
fn_expr = 0x0} <repeats 32 times>}, canPartialMatch = '\000'
<repeats 31 times>, supportCollation = {100, 0 <repeats 31 times>}}
idxStat = {nPendingPages = 0, nTotalPages = 0, nEntryPages = 522,
nDataPages = 0, nEntries = 11000, ginVersion = 0}
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 06/30/2015 11:24 PM, Andres Freund wrote:
On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:
Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.Yeah. It's probably more efficient to not release the spinlock between
checking the value and LWLockQueueSelf() though.Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
way in a bunch of callsites... So that'd be harder. Additionally I'm
planning to get rid of the spinlocks around queuing (they show up as
bottlenecks in contended workloads), so it seems more future proof not
to assume that either way. On top of that I think we should, when
available (or using the same type of fallback as for 32bit?), use 64 bit
atomics for the var anyway.I'll take a stab at fixing this tomorrow.
Thanks! Do you mean both or "just" the second issue?
Both. Here's the patch.
Previously, LWLockAcquireWithVar set the variable associated with the lock
atomically with acquiring it. Before the lwlock-scalability changes, that
was straightforward because you held the spinlock anyway, but it's a lot
harder/expensive now. So I changed the way acquiring a lock with a variable
works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
the current lock holder has updated the variable. The LWLockAcquireWithVar
function is gone - you now just use LWLockAcquire(), which always clears
the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
you want to set the variable immediately. LWLockWaitForVar() always waits
if the flag is not set, i.e. it will not return regardless of the
variable's value, if the current lock-holder has not updated it yet.This passes make check, but I haven't done any testing beyond that. Does
this look sane to you?
After applying this patch to commit fdf28853ae6a397497b79f, it has survived
testing long enough to convince that this fixes the problem.
Cheers,
Jeff
On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi>
wrote:Both. Here's the patch.
Previously, LWLockAcquireWithVar set the variable associated with the
lock atomically with acquiring it. Before the lwlock-scalability changes,
that was straightforward because you held the spinlock anyway, but it's a
lot harder/expensive now. So I changed the way acquiring a lock with a
variable works. There is now a separate flag, LW_FLAG_VAR_SET, which
indicates that the current lock holder has updated the variable. The
LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(),
which always clears the LW_FLAG_VAR_SET flag, and you can call
LWLockUpdateVar() after that if you want to set the variable immediately.
LWLockWaitForVar() always waits if the flag is not set, i.e. it will not
return regardless of the variable's value, if the current lock-holder has
not updated it yet.I ran this for a while without casserts and it seems to work. But with
casserts, I get failures in the autovac process on the GIN index.I don't see how this is related to the LWLock issue, but I didn't see it
without your patch. Perhaps the system just didn't survive long enough to
uncover it without the patch (although it shows up pretty quickly). It
could just be an overzealous Assert, since the casserts off didn't show
problems.
bt and bt full are shown below.
Cheers,
Jeff
#0 0x0000003dcb632625 in raise () from /lib64/libc.so.6
#1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6
#2 0x0000000000930b7a in ExceptionalCondition (
conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >=
(__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc
"FailedAssertion",
fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54
#3 0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
ginvacuum.c:713
It now looks like this *is* unrelated to the LWLock issue. The assert that
it is tripping over was added just recently (302ac7f27197855afa8c) and so I
had not been testing under its presence until now. It looks like it is
finding all-zero pages (index extended but then a crash before initializing
the page?) and it doesn't like them.
(gdb) f 3
(gdb) p *(char[8192]*)(page)
$11 = '\000' <repeats 8191 times>
Presumably before this assert, such pages would just be permanently
orphaned.
Cheers,
Jeff
On 2015-07-16 PM 04:03, Jeff Janes wrote:
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Both. Here's the patch.
Previously, LWLockAcquireWithVar set the variable associated with the lock
atomically with acquiring it. Before the lwlock-scalability changes, that
was straightforward because you held the spinlock anyway, but it's a lot
harder/expensive now. So I changed the way acquiring a lock with a variable
works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
the current lock holder has updated the variable. The LWLockAcquireWithVar
function is gone - you now just use LWLockAcquire(), which always clears
the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
you want to set the variable immediately. LWLockWaitForVar() always waits
if the flag is not set, i.e. it will not return regardless of the
variable's value, if the current lock-holder has not updated it yet.I ran this for a while without casserts and it seems to work. But with
casserts, I get failures in the autovac process on the GIN index.I don't see how this is related to the LWLock issue, but I didn't see it
without your patch. Perhaps the system just didn't survive long enough to
uncover it without the patch (although it shows up pretty quickly). It
could just be an overzealous Assert, since the casserts off didn't show
problems.bt and bt full are shown below.
I got a similar assert failure but with a btree index
(pg_attribute_relid_attnum_index). The backtrace looks like Jeff's:
(gdb) bt
#0 0x0000003969632625 in raise () from /lib64/libc.so.6
#1 0x0000003969633e05 in abort () from /lib64/libc.so.6
#2 0x000000000092eb9e in ExceptionalCondition (conditionName=0x9c2220
"!(((PageHeader) (page))->pd_special >= (__builtin_offsetof
(PageHeaderData, pd_linp)))",
errorType=0x9c0c41 "FailedAssertion", fileName=0x9c0c10 "nbtree.c",
lineNumber=903) at assert.c:54
#3 0x00000000004e02d8 in btvacuumpage (vstate=0x7fff2c7655f0, blkno=9,
orig_blkno=9) at nbtree.c:903
#4 0x00000000004e0067 in btvacuumscan (info=0x7fff2c765cd0,
stats=0x279f7d0, callback=0x668f6d <lazy_tid_reaped>,
callback_state=0x279e338, cycleid=49190)
at nbtree.c:821
#5 0x00000000004dfdde in btbulkdelete (fcinfo=0x7fff2c7657d0) at nbtree.c:676
#6 0x0000000000939769 in FunctionCall4Coll (flinfo=0x7fff2c765bb0,
collation=0, arg1=140733939342544, arg2=0, arg3=6721389, arg4=41542456) at
fmgr.c:1375
#7 0x00000000004d2a01 in index_bulk_delete (info=0x7fff2c765cd0,
stats=0x0, callback=0x668f6d <lazy_tid_reaped>, callback_state=0x279e338)
at indexam.c:690
#8 0x000000000066861d in lazy_vacuum_index (indrel=0x7fd40ab846f0,
stats=0x279e770, vacrelstats=0x279e338) at vacuumlazy.c:1367
#9 0x00000000006678a8 in lazy_scan_heap (onerel=0x274ec90,
vacrelstats=0x279e338, Irel=0x279e790, nindexes=2, scan_all=0 '\000') at
vacuumlazy.c:1098
#10 0x00000000006660f7 in lazy_vacuum_rel (onerel=0x274ec90, options=99,
params=0x27bdc88, bstrategy=0x27bdd18) at vacuumlazy.c:244
#11 0x0000000000665c1a in vacuum_rel (relid=1249, relation=0x7fff2c7662a0,
options=99, params=0x27bdc88) at vacuum.c:1388
#12 0x00000000006643ce in vacuum (options=99, relation=0x7fff2c7662a0,
relid=1249, params=0x27bdc88, va_cols=0x0, bstrategy=0x27bdd18,
isTopLevel=1 '\001')
at vacuum.c:293
#13 0x000000000075d23c in autovacuum_do_vac_analyze (tab=0x27bdc80,
bstrategy=0x27bdd18) at autovacuum.c:2807
#14 0x000000000075c632 in do_autovacuum () at autovacuum.c:2328
#15 0x000000000075b457 in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1647
#16 0x000000000075b0a5 in StartAutoVacWorker () at autovacuum.c:1454
#17 0x000000000076f9cc in StartAutovacuumWorker () at postmaster.c:5261
#18 0x000000000076f28a in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:4918
#19 <signal handler called>
#20 0x00000039696e1353 in __select_nocancel () from /lib64/libc.so.6
#21 0x000000000076ace7 in ServerLoop () at postmaster.c:1582
#22 0x000000000076a538 in PostmasterMain (argc=3, argv=0x26f9330) at
postmaster.c:1263
#23 0x00000000006c1c2e in main (argc=3, argv=0x26f9330) at main.c:223
Thanks,
Amit
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2015-07-19 11:49:14 -0700, Jeff Janes wrote:
After applying this patch to commit fdf28853ae6a397497b79f, it has survived
testing long enough to convince that this fixes the problem.
What was the actual workload breaking with the bug? I ran a small
variety and I couldn't reproduce it yet. I'm not saying there's no bug,
I just would like to be able to test my version of the fixes...
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 28, 2015 at 7:06 AM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2015-07-19 11:49:14 -0700, Jeff Janes wrote:
After applying this patch to commit fdf28853ae6a397497b79f, it has
survived
testing long enough to convince that this fixes the problem.
What was the actual workload breaking with the bug? I ran a small
variety and I couldn't reproduce it yet. I'm not saying there's no bug,
I just would like to be able to test my version of the fixes...
It was the torn-page fault-injection code here:
It is not a minimal set, I don't know if all parts of this are necessary to
rerproduce it. The whole crash-recovery cycling might not even be
important.
Compiled with:
./configure --enable-debug --with-libxml --with-perl --with-python
--with-ldap --with-openssl --with-gssapi
--prefix=/home/jjanes/pgsql/torn_bisect/
(Also with or without --enable-cassert).
I just ran "sh do.sh >& do.out" and eventually it stopped producing output,
and I find everything hung up.
Cheers,
Jeff
Hi,
Finally getting to this.
On 2015-07-15 18:44:03 +0300, Heikki Linnakangas wrote:
Previously, LWLockAcquireWithVar set the variable associated with the lock
atomically with acquiring it. Before the lwlock-scalability changes, that
was straightforward because you held the spinlock anyway, but it's a lot
harder/expensive now. So I changed the way acquiring a lock with a variable
works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
the current lock holder has updated the variable. The LWLockAcquireWithVar
function is gone - you now just use LWLockAcquire(), which always clears the
LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you
want to set the variable immediately.This passes make check, but I haven't done any testing beyond that. Does
this look sane to you?
The prime thing I dislike about the patch is how long it now holds the
spinlock in WaitForVar. I don't understand why that's necessary? There's
no need to hold a spinlock until after the
mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
unless I miss something?
In an earlier email you say:
After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.
But that's not a problem. The updater in that case will have queued a
wakeup for all waiters, including WaitForVar()?
I'll try to reproduce the problem now. But I do wonder if it's possibly
just the missing spinlock during the update.
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07/29/2015 02:39 PM, Andres Freund wrote:
On 2015-07-15 18:44:03 +0300, Heikki Linnakangas wrote:
Previously, LWLockAcquireWithVar set the variable associated with the lock
atomically with acquiring it. Before the lwlock-scalability changes, that
was straightforward because you held the spinlock anyway, but it's a lot
harder/expensive now. So I changed the way acquiring a lock with a variable
works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
the current lock holder has updated the variable. The LWLockAcquireWithVar
function is gone - you now just use LWLockAcquire(), which always clears the
LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you
want to set the variable immediately.This passes make check, but I haven't done any testing beyond that. Does
this look sane to you?The prime thing I dislike about the patch is how long it now holds the
spinlock in WaitForVar. I don't understand why that's necessary? There's
no need to hold a spinlock until after the
mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
unless I miss something?In an earlier email you say:
After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.But that's not a problem. The updater in that case will have queued a
wakeup for all waiters, including WaitForVar()?
If you release the spinlock before LWLockQueueSelf(), then no. It's
possible that the backend you wanted to wait for updates the variable's
value before you've queued up. Or even releases the lock, and it gets
re-acquired by another backend, before you've queued up.
Or are you thinking of just moving the SpinLockRelease to after
LWLockQueueSelf(), i.e. to the other side of the "mustwait = ..." line?
That'd probably be ok.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-07-29 14:55:54 +0300, Heikki Linnakangas wrote:
On 07/29/2015 02:39 PM, Andres Freund wrote:
In an earlier email you say:
After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.But that's not a problem. The updater in that case will have queued a
wakeup for all waiters, including WaitForVar()?If you release the spinlock before LWLockQueueSelf(), then no. It's possible
that the backend you wanted to wait for updates the variable's value before
you've queued up. Or even releases the lock, and it gets re-acquired by
another backend, before you've queued up.
For normal locks the equivalent problem is solved by re-checking wether
a conflicting lock is still held after enqueuing. Why don't we do the
same here? Doing it that way has the big advantage that we can just
remove the spinlocks entirely on platforms with atomic 64bit
reads/writes.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07/29/2015 03:08 PM, Andres Freund wrote:
On 2015-07-29 14:55:54 +0300, Heikki Linnakangas wrote:
On 07/29/2015 02:39 PM, Andres Freund wrote:
In an earlier email you say:
After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.But that's not a problem. The updater in that case will have queued a
wakeup for all waiters, including WaitForVar()?If you release the spinlock before LWLockQueueSelf(), then no. It's possible
that the backend you wanted to wait for updates the variable's value before
you've queued up. Or even releases the lock, and it gets re-acquired by
another backend, before you've queued up.For normal locks the equivalent problem is solved by re-checking wether
a conflicting lock is still held after enqueuing. Why don't we do the
same here? Doing it that way has the big advantage that we can just
remove the spinlocks entirely on platforms with atomic 64bit
reads/writes.
Ah, ok, that should work, as long as you also re-check the variable's
value after queueing. Want to write the patch, or should I?
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote:
Ah, ok, that should work, as long as you also re-check the variable's value
after queueing. Want to write the patch, or should I?
I'll try. Shouldn't be too hard.
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-07-29 14:22:23 +0200, Andres Freund wrote:
On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote:
Ah, ok, that should work, as long as you also re-check the variable's value
after queueing. Want to write the patch, or should I?I'll try. Shouldn't be too hard.
What do you think about something roughly like the attached?
- Andres
Attachments:
0001-Other-way-to-fix-the-issues-around-the-broken-LWLock.patchtext/x-patch; charset=us-asciiDownload
>From 2879408d8da7714ab6af6238dfe1c8a535800af3 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 29 Jul 2015 15:06:54 +0200
Subject: [PATCH] Other way to fix the issues around the broken LWLock variable
support.
---
src/backend/storage/lmgr/lwlock.c | 105 +++++++++++++++++++++++++-------------
1 file changed, 69 insertions(+), 36 deletions(-)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index e5566d1..c64f20d 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1066,7 +1066,15 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
/* If there's a variable associated with this lock, initialize it */
if (valptr)
+ {
+#ifdef LWLOCK_STATS
+ lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
+#else
+ SpinLockAcquire(&lock->mutex);
+#endif
*valptr = val;
+ SpinLockRelease(&lock->mutex);
+ }
TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), T_ID(lock), mode);
@@ -1259,6 +1267,60 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
}
/*
+ * Does the lwlock in its current state need to wait for the variable value to
+ * change?
+ *
+ * If we don't need to wait, and it's because the value of the variable has
+ * changed, store the current value in newval.
+ *
+ * *result is set to true if the lock was free, and false otherwise.
+ */
+static bool
+LWLockConflictsWithVar(LWLock *lock,
+ uint64 *valptr, uint64 oldval, uint64 *newval,
+ bool *result)
+{
+ bool mustwait;
+ uint64 value;
+
+ mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
+
+ if (!mustwait)
+ {
+ *result = true;
+ return false;
+ }
+
+ /*
+ * Perform comparison using spinlock as we can't rely on atomic 64 bit
+ * reads/stores. On platforms with a way to do atomic 64 bit reads/writes
+ * the spinlock can be optimized away.
+ */
+#ifdef LWLOCK_STATS
+ lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
+#else
+ SpinLockAcquire(&lock->mutex);
+#endif
+
+ *result = false;
+
+ value = *valptr;
+ if (value != oldval)
+ {
+ mustwait = false;
+ *newval = value;
+ }
+ else
+ {
+ mustwait = true;
+ }
+
+ SpinLockRelease(&lock->mutex);
+
+ return mustwait;
+}
+
+/*
* LWLockWaitForVar - Wait until lock is free, or a variable is updated.
*
* If the lock is held and *valptr equals oldval, waits until the lock is
@@ -1313,39 +1375,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
for (;;)
{
bool mustwait;
- uint64 value;
-
- mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
-
- if (mustwait)
- {
- /*
- * Perform comparison using spinlock as we can't rely on atomic 64
- * bit reads/stores.
- */
-#ifdef LWLOCK_STATS
- lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
-#else
- SpinLockAcquire(&lock->mutex);
-#endif
- /*
- * XXX: We can significantly optimize this on platforms with 64bit
- * atomics.
- */
- value = *valptr;
- if (value != oldval)
- {
- result = false;
- mustwait = false;
- *newval = value;
- }
- else
- mustwait = true;
- SpinLockRelease(&lock->mutex);
- }
- else
- mustwait = false;
+ mustwait = LWLockConflictsWithVar(lock, valptr, oldval, newval,
+ &result);
if (!mustwait)
break; /* the lock was free or value didn't match */
@@ -1365,12 +1397,13 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_RELEASE_OK);
/*
- * We're now guaranteed to be woken up if necessary. Recheck the
- * lock's state.
+ * We're now guaranteed to be woken up if necessary. Recheck the lock
+ * and variables state.
*/
- mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
+ mustwait = LWLockConflictsWithVar(lock, valptr, oldval, newval,
+ &result);
- /* Ok, lock is free after we queued ourselves. Undo queueing. */
+ /* Ok, no conflict after we queued ourselves. Undo queueing. */
if (!mustwait)
{
LOG_LWDEBUG("LWLockWaitForVar", lock, "free, undoing queue");
--
2.4.0.rc2.1.g3d6bc9a
On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 28, 2015 at 7:06 AM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2015-07-19 11:49:14 -0700, Jeff Janes wrote:
After applying this patch to commit fdf28853ae6a397497b79f, it has
survived
testing long enough to convince that this fixes the problem.
What was the actual workload breaking with the bug? I ran a small
variety and I couldn't reproduce it yet. I'm not saying there's no bug,
I just would like to be able to test my version of the fixes...It was the torn-page fault-injection code here:
It is not a minimal set, I don't know if all parts of this are necessary
to rerproduce it. The whole crash-recovery cycling might not even be
important.
I've reproduced it again against commit b2ed8edeecd715c8a23ae462.
It took 5 hours on a 8 core "Intel(R) Xeon(R) CPU E5-2650".
I also reproduced it in 3 hours on the same machine with both JJ_torn_page
and JJ_xid set to zero (i.e. turned off, no induced crashes), so the
fault-injection patch should not be necessary to get the issue..
Cheers,
Jeff
On 2015-07-29 09:23:32 -0700, Jeff Janes wrote:
On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
I've reproduced it again against commit b2ed8edeecd715c8a23ae462.It took 5 hours on a 8 core "Intel(R) Xeon(R) CPU E5-2650".
I also reproduced it in 3 hours on the same machine with both JJ_torn_page
and JJ_xid set to zero (i.e. turned off, no induced crashes), so the
fault-injection patch should not be necessary to get the issue..
Hm, that's a single socket or dual socket E5-2650 system? That CPU
itself has 8 cores, and can work in a dual socket system, that's why I
ask.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 29, 2015 at 9:26 AM, Andres Freund <andres@anarazel.de> wrote:
On 2015-07-29 09:23:32 -0700, Jeff Janes wrote:
On Tue, Jul 28, 2015 at 9:06 AM, Jeff Janes <jeff.janes@gmail.com>
wrote:
I've reproduced it again against commit b2ed8edeecd715c8a23ae462.
It took 5 hours on a 8 core "Intel(R) Xeon(R) CPU E5-2650".
I also reproduced it in 3 hours on the same machine with both
JJ_torn_page
and JJ_xid set to zero (i.e. turned off, no induced crashes), so the
fault-injection patch should not be necessary to get the issue..Hm, that's a single socket or dual socket E5-2650 system? That CPU
itself has 8 cores, and can work in a dual socket system, that's why I
ask.
It is a virtual machine, and I think a property of the VM software is that
any given virtual machine can only run on one socket of the underlying
hardware. The real machine is dual socket, though.
Cheers,
Jeff
On 07/29/2015 04:10 PM, Andres Freund wrote:
On 2015-07-29 14:22:23 +0200, Andres Freund wrote:
On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote:
Ah, ok, that should work, as long as you also re-check the variable's value
after queueing. Want to write the patch, or should I?I'll try. Shouldn't be too hard.
What do you think about something roughly like the attached?
Hmm. Imagine this:
Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting
on it. The lock holder releases the lock, and wakes up A. But before A
wakes up and sees that the lock is free, another backend acquires the
lock again. It runs LWLockAcquireWithVar to the point just before
setting the variable's value. Now A wakes up, sees that the lock is
still (or again) held, and that the variable's value still matches the
old one, and goes back to sleep. The new lock holder won't wake it up
until it updates the value again, or to releases the lock.
I think that's OK given the current usage of this in WALInsertLocks, but
it's scary. At the very least you need comments to explain that race
condition. You didn't like the new LW_FLAG_VAR_SET flag and the API
changes I proposed? That eliminates the race condition.
Either way, there is a race condition that if the new lock holder sets
the variable to the *same* value as before, the old waiter won't
necessarily wake up even though the lock was free or had a different
value in between. But that's easier to explain and understand than the
fact that the value set by LWLockAcquireWithVar() might not be seen by a
waiter, until you release the lock or update it again.
Another idea is to have LWLockAcquire check the wait queue after setting
the variable, and wake up anyone who's already queued up. You could just
call LWLockUpdateVar() from LWLockAcquireCommon to set the variable. I
would prefer the approach from my previous patch more, though. That
would avoid having to acquire the spinlock in LWLockAcquire altogether,
and I actually like the API change from that independently from any
correctness issues.
The changes in LWLockWaitForVar() and LWLockConflictsWithVar() seem OK
in principle. You'll want to change LWLockConflictsWithVar() so that the
spinlock is held only over the "value = *valptr" line, though; the other
stuff just modifies local variables and don't need to be protected by
the spinlock. Also, when you enter LWLockWaitForVar(), you're checking
if the lock is held twice in a row; first at the top of the function,
and again inside LWLockConflictsWithVar. You could just remove the
"quick test" at the top.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-07-29 20:23:24 +0300, Heikki Linnakangas wrote:
Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting on
it. The lock holder releases the lock, and wakes up A. But before A wakes up
and sees that the lock is free, another backend acquires the lock again. It
runs LWLockAcquireWithVar to the point just before setting the variable's
value. Now A wakes up, sees that the lock is still (or again) held, and that
the variable's value still matches the old one, and goes back to sleep. The
new lock holder won't wake it up until it updates the value again, or to
releases the lock.
I'm not sure whether this observation is about my patch or the general
lwlock variable mechanism. In my opinion that behaviour exists today
both in 9.4 and 9.5.
But I think that's fine because that "race" seems pretty
fundamental. After all, you could have called LWLockWaitForVar() just
after the second locker had set the variable to the same value.
You didn't like the new LW_FLAG_VAR_SET flag and the API changes I
proposed?
I do slightly dislike the additional bit of math in AttemptLock. But
what I really disliked about the patch is the reliance on holding the
spinlock over longer times and conditionally acquiring spinlocks. I
didn't see a need for the flags change to fix the issue at hand, that's
why I didn't incorporate it. I'm not fundamentally against it.
Either way, there is a race condition that if the new lock holder sets the
variable to the *same* value as before, the old waiter won't necessarily
wake up even though the lock was free or had a different value in between.
But that's easier to explain and understand than the fact that the value set
by LWLockAcquireWithVar() might not be seen by a waiter, until you release
the lock or update it again.
I can't really se a use for the API that'd allow that and care about the
waits. Because quite fundamentally you could just started waiting after
the variable was set to the value at the same time.
Another idea is to have LWLockAcquire check the wait queue after setting the
variable, and wake up anyone who's already queued up.
Ugh, not a fan of that.
The changes in LWLockWaitForVar() and LWLockConflictsWithVar() seem OK in
principle. You'll want to change LWLockConflictsWithVar() so that the
spinlock is held only over the "value = *valptr" line, though; the other
stuff just modifies local variables and don't need to be protected by the
spinlock.
good point.,
Also, when you enter LWLockWaitForVar(), you're checking if the
lock is held twice in a row; first at the top of the function, and again
inside LWLockConflictsWithVar. You could just remove the "quick test" at the
top.
Yea, I was thinking about removing it. The first check was there
previously, so I left it in place. We do execute a bit more code once
we've disabled interrupts (extraWaits, re-enabling interrupts). I don't
think it'll matter much, but it seemed like an independent thing.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07/29/2015 09:35 PM, Andres Freund wrote:
On 2015-07-29 20:23:24 +0300, Heikki Linnakangas wrote:
Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting on
it. The lock holder releases the lock, and wakes up A. But before A wakes up
and sees that the lock is free, another backend acquires the lock again. It
runs LWLockAcquireWithVar to the point just before setting the variable's
value. Now A wakes up, sees that the lock is still (or again) held, and that
the variable's value still matches the old one, and goes back to sleep. The
new lock holder won't wake it up until it updates the value again, or to
releases the lock.I'm not sure whether this observation is about my patch or the general
lwlock variable mechanism. In my opinion that behaviour exists today
both in 9.4 and 9.5.
In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held,
until it has updated the variable. And LWLockWaitForVar() holds the
spinlock when it checks that the lock is held and that the variable's
value matches. So it cannot happen on 9.4.
To reiterate, with 9.5, it's possible that a backend is sleeping in
LWLockWaitForVar(oldvar=123), even though the lock is currently held by
another backend with value 124. That seems wrong, or surprising at the
very least.
But I think that's fine because that "race" seems pretty
fundamental. After all, you could have called LWLockWaitForVar() just
after the second locker had set the variable to the same value.
I'm not talking about setting it to the same value. I'm talking about
setting it to a different value. (I talked about setting it to the same
value later in the email, and I agree that's a pretty fundamental
problem and exists with 9.4 as well).
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund <andres@anarazel.de> wrote:
On 2015-07-29 14:22:23 +0200, Andres Freund wrote:
On 2015-07-29 15:14:23 +0300, Heikki Linnakangas wrote:
Ah, ok, that should work, as long as you also re-check the variable's
value
after queueing. Want to write the patch, or should I?
I'll try. Shouldn't be too hard.
What do you think about something roughly like the attached?
I've not evaluated the code, but applying it does solve the problem I was
seeing.
Cheers,
Jeff
On 2015-07-30 17:36:52 +0300, Heikki Linnakangas wrote:
In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held,
until it has updated the variable. And LWLockWaitForVar() holds the spinlock
when it checks that the lock is held and that the variable's value matches.
So it cannot happen on 9.4.
The first paragraph talked about "the same value", but that was just
referring to it not yet having been cleared i think...
To reiterate, with 9.5, it's possible that a backend is sleeping in
LWLockWaitForVar(oldvar=123), even though the lock is currently held by
another backend with value 124. That seems wrong, or surprising at the very
least.
With my patch that can't really happen that way though? The value is
re-checked after queuing. If it has changed by then we're done. And if
it hasn't yet changed we're guaranteed to be woken up once it's being
changed?
I generaly don't mind adding some sort of flag clearing or such, but I'd
rather not have it in the retry loop in the general LWLockAttemptLock
path - I found that very small amounts of instructions in there have a
measurable impact.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-07-30 09:03:01 -0700, Jeff Janes wrote:
On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund <andres@anarazel.de> wrote:
What do you think about something roughly like the attached?
I've not evaluated the code, but applying it does solve the problem I was
seeing.
Cool, thanks for testing! How long did you run the test and how long did
it, on average, previously take for the issue to occur?
Regards,
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07/30/2015 09:14 PM, Andres Freund wrote:
On 2015-07-30 17:36:52 +0300, Heikki Linnakangas wrote:
In 9.4, LWLockAcquire holds the spinlock when it marks the lock as held,
until it has updated the variable. And LWLockWaitForVar() holds the spinlock
when it checks that the lock is held and that the variable's value matches.
So it cannot happen on 9.4.The first paragraph talked about "the same value", but that was just
referring to it not yet having been cleared i think...To reiterate, with 9.5, it's possible that a backend is sleeping in
LWLockWaitForVar(oldvar=123), even though the lock is currently held by
another backend with value 124. That seems wrong, or surprising at the very
least.With my patch that can't really happen that way though? The value is
re-checked after queuing. If it has changed by then we're done. And if
it hasn't yet changed we're guaranteed to be woken up once it's being
changed?
Ok, let me try explaining it again. With your patch,
LWLockAcquireWithVar looks like this (simplified, assuming the lock is
free):
1. Set LW_VAL_EXCLUSIVE
3. (Acquire spinlock) Set *valptr = val (Release spinlock)
LWLockWaitForVar looks like this:
1. Check if lock is free
2. (Acquire Spinlock) Read *valptr, compare with oldval (Release spinlock)
3. LWLockQueueSelf
4. Re-check that *valptr is still equal to oldval
5. sleep.
This is the race condition:
Backend A Backend B
--------- ---------
LWLockAcquireWithVar(123)
Set LW_VAL_EXCLUSIVE
Set *valptr = 123
LWLockWaitForVar(123)
See that the lock is held
Read *valptr, it matches
LWLockQueueSelf
LWLockRelease()
LWLockAcquireWithVar(124)
Set LW_VAL_EXCLUSIVE
wakes up
See that the lock is still/again held
Read *valptr, it's still 123
LWLockQueueSelf
Re-check that *valptr is still 123.
go to sleep.
Set *valptr = 124
Now, Backend B's LWLockWaitForVar(123) call is sleeping, even though the
lock was free'd and reacquired with different value, 124. It won't wake
up until A updates the value or releases the lock again.
This was not possible in 9.4. It was possible in 9.4 too when the value
was same in both LWLockAcquireWithVar() calls, and I think that's
acceptable, but not when the values differ.
I generaly don't mind adding some sort of flag clearing or such, but I'd
rather not have it in the retry loop in the general LWLockAttemptLock
path - I found that very small amounts of instructions in there have a
measurable impact.
I doubt clearing a bit would matter there, although you might have a
better instinct on that...
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 30, 2015 at 8:22 PM, Andres Freund <andres@anarazel.de> wrote:
On 2015-07-30 09:03:01 -0700, Jeff Janes wrote:
On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund <andres@anarazel.de>
wrote:
What do you think about something roughly like the attached?
I've not evaluated the code, but applying it does solve the problem I was
seeing.Cool, thanks for testing! How long did you run the test and how long did
it, on average, previously take for the issue to occur?
I had run it for 24 hours, while it usually took less than 8 hours to look
up before. I did see it within a few minutes one time when I checked out a
new HEAD and then forgot to re-apply your or Heikki's patch.
But now I've got the same situation again, after 15 hours, with your
patch. This is probably all down to luck. The only differences that I can
think of is that I advanced the base to e8e86fbc8b3619da54c, and turned on
JJ_vac and set log_autovacuum_min_duration=0.
checkpointer, 31516
(gdb) bt
#0 0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1 0x00000000006481df in PGSemaphoreLock (sema=0x7f84ccd3c5f8) at
pg_sema.c:387
#2 0x00000000006a5754 in LWLockAcquireCommon (l=0x7f84c42edf80,
valptr=0x7f84c42edfa8, val=18446744073709551615) at lwlock.c:1041
#3 LWLockAcquireWithVar (l=0x7f84c42edf80, valptr=0x7f84c42edfa8,
val=18446744073709551615) at lwlock.c:915
#4 0x00000000004cbd86 in WALInsertLockAcquireExclusive () at xlog.c:1445
#5 0x00000000004d4195 in CreateCheckPoint (flags=64) at xlog.c:8181
#6 0x000000000064ea53 in CheckpointerMain () at checkpointer.c:493
#7 0x00000000004e328a in AuxiliaryProcessMain (argc=2,
argv=0x7fff6eed1f50) at bootstrap.c:428
#8 0x00000000006563f7 in StartChildProcess (type=CheckpointerProcess) at
postmaster.c:5167
#9 0x000000000065884a in reaper (postgres_signal_arg=<value optimized
out>) at postmaster.c:2670
(gdb) f 2
(gdb) p *(lock->owner)
$1 = {links = {prev = 0x0, next = 0x0}, procgloballist = 0x7f84ccd27c88,
sem = {semId = 1480359975, semNum = 12}, waitStatus = 0, procLatch =
{is_set = 1,
is_shared = 1 '\001', owner_pid = 31540}, lxid = 3209, pid = 31540,
pgprocno = 92, backendId = 12, databaseId = 16384, roleId = 10,
recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode = 2
'\002', lwWaitLink = {prev = 0x7f84c42edf10, next = 0x7f84ccd38e60},
waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0,
waitLSN = 0, syncRepState = 0, syncRepLinks = {prev = 0x0, next = 0x0},
myProcLocks = {{prev = 0x7f84ccd38bb8, next = 0x7f84ccd38bb8}, {prev =
0x7f84ccd38bc8, next = 0x7f84ccd38bc8}, {prev = 0x7f84ccd38bd8,
next = 0x7f84ccd38bd8}, {prev = 0x7f84ccd38be8, next =
0x7f84ccd38be8}, {prev = 0x7f84ccd38bf8, next = 0x7f84ccd38bf8}, {prev =
0x7f84ccd38c08,
next = 0x7f84ccd38c08}, {prev = 0x7f84ccd38c18, next =
0x7f84ccd38c18}, {prev = 0x7f84ccd38c28, next = 0x7f84ccd38c28}, {prev =
0x7f84ccd38c38,
next = 0x7f84ccd38c38}, {prev = 0x7f84ccd38c48, next =
0x7f84ccd38c48}, {prev = 0x7f84ccd38c58, next = 0x7f84ccd38c58}, {prev =
0x7f84ccd38c68,
next = 0x7f84ccd38c68}, {prev = 0x7f84ccd38c78, next =
0x7f84ccd38c78}, {prev = 0x7f84ccd38c88, next = 0x7f84ccd38c88}, {prev =
0x7f84ccb0ceb0,
next = 0x7f84ccb0ceb0}, {prev = 0x7f84ccd38ca8, next =
0x7f84ccd38ca8}}, subxids = {xids = {0 <repeats 64 times>}}, backendLock =
0x7f84c42e9800,
fpLockBits = 160528697655296, fpRelId = {0, 0, 0, 0, 0, 0, 0, 0, 0, 2679,
2610, 16413, 16411, 16415, 16414, 16408}, fpVXIDLock = 1 '\001',
fpLocalTransactionId = 3209}
backend 31540:
#0 0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1 0x00000000006481df in PGSemaphoreLock (sema=0x7f84ccd38b38) at
pg_sema.c:387
#2 0x00000000006a53d4 in LWLockWaitForVar (lock=0x7f84c42edf00,
valptr=0x7f84c42edf28, oldval=0, newval=0x7fff6eeceaf8) at lwlock.c:1438
#3 0x00000000004d3276 in WaitXLogInsertionsToFinish (upto=183341449216) at
xlog.c:1576
#4 0x00000000004d3b33 in AdvanceXLInsertBuffer (upto=183345636352,
opportunistic=<value optimized out>) at xlog.c:1888
#5 0x00000000004d8df2 in GetXLogBuffer (ptr=183345636352) at xlog.c:1669
#6 0x00000000004d8fea in CopyXLogRecordToWAL (rdata=0xbd2410,
fpw_lsn=<value optimized out>) at xlog.c:1276
#7 XLogInsertRecord (rdata=0xbd2410, fpw_lsn=<value optimized out>) at
xlog.c:1008
#8 0x00000000004dc72c in XLogInsert (rmid=13 '\r', info=32 ' ') at
xloginsert.c:453
#9 0x0000000000478f41 in ginPlaceToPage (btree=0x7fff6eecee20,
stack=0x1be11e0, insertdata=<value optimized out>, updateblkno=<value
optimized out>,
childbuf=0, buildStats=0x0) at ginbtree.c:418
#10 0x00000000004798a3 in ginInsertValue (btree=0x7fff6eecee20,
stack=0x1be11e0, insertdata=0x7fff6eeceea0, buildStats=0x0) at
ginbtree.c:748
#11 0x00000000004734aa in ginEntryInsert (ginstate=0x7fff6eecf200,
attnum=39408, key=1, category=<value optimized out>, items=0x1c81c20,
nitem=54,
buildStats=0x0) at gininsert.c:234
#12 0x000000000047edbb in ginInsertCleanup (ginstate=0x7fff6eecf200,
vac_delay=<value optimized out>, stats=0x0) at ginfast.c:843
#13 0x000000000047fae6 in ginHeapTupleFastInsert (ginstate=0x7fff6eecf200,
collector=<value optimized out>) at ginfast.c:436
#14 0x000000000047386a in gininsert (fcinfo=<value optimized out>) at
gininsert.c:531
...
(gdb) f 2
(gdb) p *(lock->owner)
$1 = {links = {prev = 0x0, next = 0x0}, procgloballist = 0x0, sem = {semId
= 1480425513, semNum = 0}, waitStatus = 0, procLatch = {is_set = 0,
is_shared = 1 '\001', owner_pid = 31516}, lxid = 0, pid = 31516,
pgprocno = 112, backendId = -1, databaseId = 0, roleId = 0,
recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode = 0
'\000', lwWaitLink = {prev = 0x7f84ccd3cc10, next = 0x7f84ccd3a5e0},
waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0,
waitLSN = 0, syncRepState = 0, syncRepLinks = {prev = 0x0, next = 0x0},
myProcLocks = {{prev = 0x7f84ccd3c678, next = 0x7f84ccd3c678}, {prev =
0x7f84ccd3c688, next = 0x7f84ccd3c688}, {prev = 0x7f84ccd3c698,
next = 0x7f84ccd3c698}, {prev = 0x7f84ccd3c6a8, next =
0x7f84ccd3c6a8}, {prev = 0x7f84ccd3c6b8, next = 0x7f84ccd3c6b8}, {prev =
0x7f84ccd3c6c8,
next = 0x7f84ccd3c6c8}, {prev = 0x7f84ccd3c6d8, next =
0x7f84ccd3c6d8}, {prev = 0x7f84ccd3c6e8, next = 0x7f84ccd3c6e8}, {prev =
0x7f84ccd3c6f8,
next = 0x7f84ccd3c6f8}, {prev = 0x7f84ccd3c708, next =
0x7f84ccd3c708}, {prev = 0x7f84ccd3c718, next = 0x7f84ccd3c718}, {prev =
0x7f84ccd3c728,
next = 0x7f84ccd3c728}, {prev = 0x7f84ccd3c738, next =
0x7f84ccd3c738}, {prev = 0x7f84ccd3c748, next = 0x7f84ccd3c748}, {prev =
0x7f84ccd3c758,
next = 0x7f84ccd3c758}, {prev = 0x7f84ccd3c768, next =
0x7f84ccd3c768}}, subxids = {xids = {0 <repeats 64 times>}}, backendLock =
0x7f84c42e9d00,
fpLockBits = 0, fpRelId = {0 <repeats 16 times>}, fpVXIDLock = 0 '\000',
fpLocalTransactionId = 0}
Cheers,
Jeff
Hi Jeff, Heikki,
On 2015-07-31 09:48:28 -0700, Jeff Janes wrote:
I had run it for 24 hours, while it usually took less than 8 hours to look
up before. I did see it within a few minutes one time when I checked out a
new HEAD and then forgot to re-apply your or Heikki's patch.But now I've got the same situation again, after 15 hours, with your
patch. This is probably all down to luck. The only differences that I can
think of is that I advanced the base to e8e86fbc8b3619da54c, and turned on
JJ_vac and set log_autovacuum_min_duration=0.
It's quite possible that you hit the remaining race-condition that
Heikki was talking about. So that'd make it actually likely to be hit
slightly later - but as you say this is just a game of chance.
I've attached a version of the patch that should address Heikki's
concern. It imo also improves the API and increases debuggability by not
having stale variable values in the variables anymore. (also attached is
a minor optimization that Heikki has noticed)
I plan to commit the patch tomorrow, so it's included in alpha2.
Regards,
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-08-02 17:04:07 +0200, Andres Freund wrote:
I've attached a version of the patch that should address Heikki's
concern. It imo also improves the API and increases debuggability by not
having stale variable values in the variables anymore. (also attached is
a minor optimization that Heikki has noticed)
...
Attachments:
0001-Fix-issues-around-the-variable-support-in-the-lwlock.patchtext/x-patch; charset=us-asciiDownload
>From 17b4e0692ebd91a92448b0abb8cf4438a93a29d6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 31 Jul 2015 20:20:43 +0200
Subject: [PATCH 1/2] Fix issues around the "variable" support in the lwlock
infrastructure.
The lwlock scalability work introduced two race conditions into the
lwlock variable support provided for xlog.c. First, and harmlessly on
most platforms, it set/read the variable without the spinlock in some
places. Secondly, due to the removal of the spinlock, it was possible
that a backend missed changes to the variable's state if it changed in
the wrong moment because checking the lock's state, the variable's state
and the queuing are not protected by a single spinlock acquisition
anymore.
To fix first move resetting the variable's from LWLockAcquireWithVar to
WALInsertLockRelease, via a new function LWLockReleaseClearVar. That
prevents issues around waiting for a variable's value to change when a
new locker has acquired the lock, but not yet set the value. Secondly
re-check that the variable hasn't changed after enqueing, that prevents
the issue that the lock has been released and already re-acquired by the
time the woken up backend checks for the lock's state.
Reported-By: Jeff Janes
Analyzed-By: Heikki Linnakangas
Reviewed-By: Heikki Linnakangas
Discussion: 5592DB35.2060401@iki.fi
Backpatch: 9.5, where the lwlock scalability went in
---
src/backend/access/transam/xlog.c | 34 ++++---
src/backend/storage/lmgr/lwlock.c | 193 +++++++++++++++++++++-----------------
src/include/storage/lwlock.h | 2 +-
3 files changed, 129 insertions(+), 100 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1dd31b3..939813e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1408,9 +1408,7 @@ WALInsertLockAcquire(void)
* The insertingAt value is initially set to 0, as we don't know our
* insert location yet.
*/
- immed = LWLockAcquireWithVar(&WALInsertLocks[MyLockNo].l.lock,
- &WALInsertLocks[MyLockNo].l.insertingAt,
- 0);
+ immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
if (!immed)
{
/*
@@ -1435,26 +1433,28 @@ WALInsertLockAcquireExclusive(void)
int i;
/*
- * When holding all the locks, we only update the last lock's insertingAt
- * indicator. The others are set to 0xFFFFFFFFFFFFFFFF, which is higher
- * than any real XLogRecPtr value, to make sure that no-one blocks waiting
- * on those.
+ * When holding all the locks, all but the last lock's insertingAt
+ * indicator is set to 0xFFFFFFFFFFFFFFFF, which is higher than any real
+ * XLogRecPtr value, to make sure that no-one blocks waiting on those.
*/
for (i = 0; i < NUM_XLOGINSERT_LOCKS - 1; i++)
{
- LWLockAcquireWithVar(&WALInsertLocks[i].l.lock,
- &WALInsertLocks[i].l.insertingAt,
- PG_UINT64_MAX);
+ LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
+ LWLockUpdateVar(&WALInsertLocks[i].l.lock,
+ &WALInsertLocks[i].l.insertingAt,
+ PG_UINT64_MAX);
}
- LWLockAcquireWithVar(&WALInsertLocks[i].l.lock,
- &WALInsertLocks[i].l.insertingAt,
- 0);
+ /* Variable value reset to 0 at release */
+ LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
holdingAllLocks = true;
}
/*
* Release our insertion lock (or locks, if we're holding them all).
+ *
+ * NB: Reset all variables to 0, so they cause LWLockWaitForVar to block the
+ * next time the lock is acquired.
*/
static void
WALInsertLockRelease(void)
@@ -1464,13 +1464,17 @@ WALInsertLockRelease(void)
int i;
for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
- LWLockRelease(&WALInsertLocks[i].l.lock);
+ LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
+ &WALInsertLocks[i].l.insertingAt,
+ 0);
holdingAllLocks = false;
}
else
{
- LWLockRelease(&WALInsertLocks[MyLockNo].l.lock);
+ LWLockReleaseClearVar(&WALInsertLocks[MyLockNo].l.lock,
+ &WALInsertLocks[MyLockNo].l.insertingAt,
+ 0);
}
}
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index e5566d1..ae03eb1 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -10,13 +10,15 @@
* locking should be done with the full lock manager --- which depends on
* LWLocks to protect its shared state.
*
- * In addition to exclusive and shared modes, lightweight locks can be used
- * to wait until a variable changes value. The variable is initially set
- * when the lock is acquired with LWLockAcquireWithVar, and can be updated
+ * In addition to exclusive and shared modes, lightweight locks can be used to
+ * wait until a variable changes value. The variable is initially not set
+ * when the lock is acquired with LWLockAcquire, i.e. it remains set to the
+ * value it was set to when the lock was released last, and can be updated
* without releasing the lock by calling LWLockUpdateVar. LWLockWaitForVar
- * waits for the variable to be updated, or until the lock is free. The
- * meaning of the variable is up to the caller, the lightweight lock code
- * just assigns and compares it.
+ * waits for the variable to be updated, or until the lock is free. When
+ * releasing the lock with LWLockReleaseClearVar() the value can be set to an
+ * appropriate value for a free lock. The meaning of the variable is up to
+ * the caller, the lightweight lock code just assigns and compares it.
*
* Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -150,9 +152,6 @@ static LWLockHandle held_lwlocks[MAX_SIMUL_LWLOCKS];
static int lock_addin_request = 0;
static bool lock_addin_request_allowed = true;
-static inline bool LWLockAcquireCommon(LWLock *l, LWLockMode mode,
- uint64 *valptr, uint64 val);
-
#ifdef LWLOCK_STATS
typedef struct lwlock_stats_key
{
@@ -899,25 +898,7 @@ LWLockDequeueSelf(LWLock *lock)
* Side effect: cancel/die interrupts are held off until lock release.
*/
bool
-LWLockAcquire(LWLock *l, LWLockMode mode)
-{
- return LWLockAcquireCommon(l, mode, NULL, 0);
-}
-
-/*
- * LWLockAcquireWithVar - like LWLockAcquire, but also sets *valptr = val
- *
- * The lock is always acquired in exclusive mode with this function.
- */
-bool
-LWLockAcquireWithVar(LWLock *l, uint64 *valptr, uint64 val)
-{
- return LWLockAcquireCommon(l, LW_EXCLUSIVE, valptr, val);
-}
-
-/* internal function to implement LWLockAcquire and LWLockAcquireWithVar */
-static inline bool
-LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
+LWLockAcquire(LWLock *lock, LWLockMode mode)
{
PGPROC *proc = MyProc;
bool result = true;
@@ -1064,10 +1045,6 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
result = false;
}
- /* If there's a variable associated with this lock, initialize it */
- if (valptr)
- *valptr = val;
-
TRACE_POSTGRESQL_LWLOCK_ACQUIRE(T_NAME(lock), T_ID(lock), mode);
/* Add lock to list of locks held by this backend */
@@ -1259,6 +1236,71 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
}
/*
+ * Does the lwlock in its current state need to wait for the variable value to
+ * change?
+ *
+ * If we don't need to wait, and it's because the value of the variable has
+ * changed, store the current value in newval.
+ *
+ * *result is set to true if the lock was free, and false otherwise.
+ */
+static bool
+LWLockConflictsWithVar(LWLock *lock,
+ uint64 *valptr, uint64 oldval, uint64 *newval,
+ bool *result)
+{
+ bool mustwait;
+ uint64 value;
+#ifdef LWLOCK_STATS
+ lwlock_stats *lwstats;
+
+ lwstats = get_lwlock_stats_entry(lock);
+#endif
+
+ /*
+ * Test first to see if it the slot is free right now.
+ *
+ * XXX: the caller uses a spinlock before this, so we don't need a memory
+ * barrier here as far as the current usage is concerned. But that might
+ * not be safe in general.
+ */
+ mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
+
+ if (!mustwait)
+ {
+ *result = true;
+ return false;
+ }
+
+ *result = false;
+
+ /*
+ * Read value using spinlock as we can't rely on atomic 64 bit
+ * reads/stores. TODO: On platforms with a way to do atomic 64 bit
+ * reads/writes the spinlock could be optimized away.
+ */
+#ifdef LWLOCK_STATS
+ lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
+#else
+ SpinLockAcquire(&lock->mutex);
+#endif
+ value = *valptr;
+ SpinLockRelease(&lock->mutex);
+
+ if (value != oldval)
+ {
+ mustwait = false;
+ *newval = value;
+ }
+ else
+ {
+ mustwait = true;
+ }
+
+ return mustwait;
+}
+
+/*
* LWLockWaitForVar - Wait until lock is free, or a variable is updated.
*
* If the lock is held and *valptr equals oldval, waits until the lock is
@@ -1268,11 +1310,6 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
* matches oldval, returns false and sets *newval to the current value in
* *valptr.
*
- * It's possible that the lock holder releases the lock, but another backend
- * acquires it again before we get a chance to observe that the lock was
- * momentarily released. We wouldn't need to wait for the new lock holder,
- * but we cannot distinguish that case, so we will have to wait.
- *
* Note: this function ignores shared lock holders; if the lock is held
* in shared mode, returns 'true'.
*/
@@ -1291,16 +1328,6 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
PRINT_LWDEBUG("LWLockWaitForVar", lock, LW_WAIT_UNTIL_FREE);
/*
- * Quick test first to see if it the slot is free right now.
- *
- * XXX: the caller uses a spinlock before this, so we don't need a memory
- * barrier here as far as the current usage is concerned. But that might
- * not be safe in general.
- */
- if ((pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) == 0)
- return true;
-
- /*
* Lock out cancel/die interrupts while we sleep on the lock. There is no
* cleanup mechanism to remove us from the wait queue if we got
* interrupted.
@@ -1313,39 +1340,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
for (;;)
{
bool mustwait;
- uint64 value;
-
- mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
-
- if (mustwait)
- {
- /*
- * Perform comparison using spinlock as we can't rely on atomic 64
- * bit reads/stores.
- */
-#ifdef LWLOCK_STATS
- lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
-#else
- SpinLockAcquire(&lock->mutex);
-#endif
- /*
- * XXX: We can significantly optimize this on platforms with 64bit
- * atomics.
- */
- value = *valptr;
- if (value != oldval)
- {
- result = false;
- mustwait = false;
- *newval = value;
- }
- else
- mustwait = true;
- SpinLockRelease(&lock->mutex);
- }
- else
- mustwait = false;
+ mustwait = LWLockConflictsWithVar(lock, valptr, oldval, newval,
+ &result);
if (!mustwait)
break; /* the lock was free or value didn't match */
@@ -1354,7 +1351,9 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
* Add myself to wait queue. Note that this is racy, somebody else
* could wakeup before we're finished queuing. NB: We're using nearly
* the same twice-in-a-row lock acquisition protocol as
- * LWLockAcquire(). Check its comments for details.
+ * LWLockAcquire(). Check its comments for details. The only
+ * difference is that we also have to check the variable's values when
+ * checking the state of the lock.
*/
LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE);
@@ -1365,12 +1364,13 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_RELEASE_OK);
/*
- * We're now guaranteed to be woken up if necessary. Recheck the
- * lock's state.
+ * We're now guaranteed to be woken up if necessary. Recheck the lock
+ * and variables state.
*/
- mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
+ mustwait = LWLockConflictsWithVar(lock, valptr, oldval, newval,
+ &result);
- /* Ok, lock is free after we queued ourselves. Undo queueing. */
+ /* Ok, no conflict after we queued ourselves. Undo queueing. */
if (!mustwait)
{
LOG_LWDEBUG("LWLockWaitForVar", lock, "free, undoing queue");
@@ -1587,6 +1587,31 @@ LWLockRelease(LWLock *lock)
RESUME_INTERRUPTS();
}
+/*
+ * LWLockReleaseClearVar - release a previously acquired lock, reset variable
+ */
+void
+LWLockReleaseClearVar(LWLock *lock, uint64 *valptr, uint64 val)
+{
+#ifdef LWLOCK_STATS
+ lwlock_stats *lwstats;
+
+ lwstats = get_lwlock_stats_entry(lock);
+ lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
+#else
+ SpinLockAcquire(&lock->mutex);
+#endif
+ /*
+ * Set the variable's value before releasing the lock, that prevents race
+ * a race condition wherein a new locker acquires the lock, but hasn't yet
+ * set the variables value.
+ */
+ *valptr = val;
+ SpinLockRelease(&lock->mutex);
+
+ LWLockRelease(lock);
+}
+
/*
* LWLockReleaseAll - release all currently-held locks
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index cff3b99..cbd6318 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -182,10 +182,10 @@ extern bool LWLockAcquire(LWLock *lock, LWLockMode mode);
extern bool LWLockConditionalAcquire(LWLock *lock, LWLockMode mode);
extern bool LWLockAcquireOrWait(LWLock *lock, LWLockMode mode);
extern void LWLockRelease(LWLock *lock);
+extern void LWLockReleaseClearVar(LWLock *lock, uint64 *valptr, uint64 val);
extern void LWLockReleaseAll(void);
extern bool LWLockHeldByMe(LWLock *lock);
-extern bool LWLockAcquireWithVar(LWLock *lock, uint64 *valptr, uint64 val);
extern bool LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval);
extern void LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 value);
--
2.4.0.rc2.1.g3d6bc9a
0002-Micro-optimize-LWLockAttemptLock-a-bit.patchtext/x-patch; charset=us-asciiDownload
>From 3560baf051857022808fbbc10d9b0d96dd8630af Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 31 Jul 2015 20:50:35 +0200
Subject: [PATCH 2/2] Micro optimize LWLockAttemptLock() a bit.
LWLockAttemptLock pointlessly read the lock's state in every loop
iteration, even though pg_atomic_compare_exchange_u32() returns the old
value. Instead do that only once before the loop iteration.
Additionally there's no need to have the expected_state variable,
old_state mostly had the same value anyway.
Noticed-By: Heikki Linnakangas
Backpatch: 9.5, no reason to let the branches diverge at this point
---
src/backend/storage/lmgr/lwlock.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index ae03eb1..687ed63 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -582,29 +582,33 @@ LWLockInitialize(LWLock *lock, int tranche_id)
static bool
LWLockAttemptLock(LWLock *lock, LWLockMode mode)
{
+ uint32 old_state;
+
AssertArg(mode == LW_EXCLUSIVE || mode == LW_SHARED);
+ /*
+ * Read once outside the loop, later iterations will get the newer value
+ * via compare & exchange.
+ */
+ old_state = pg_atomic_read_u32(&lock->state);
+
/* loop until we've determined whether we could acquire the lock or not */
while (true)
{
- uint32 old_state;
- uint32 expected_state;
uint32 desired_state;
bool lock_free;
- old_state = pg_atomic_read_u32(&lock->state);
- expected_state = old_state;
- desired_state = expected_state;
+ desired_state = old_state;
if (mode == LW_EXCLUSIVE)
{
- lock_free = (expected_state & LW_LOCK_MASK) == 0;
+ lock_free = (old_state & LW_LOCK_MASK) == 0;
if (lock_free)
desired_state += LW_VAL_EXCLUSIVE;
}
else
{
- lock_free = (expected_state & LW_VAL_EXCLUSIVE) == 0;
+ lock_free = (old_state & LW_VAL_EXCLUSIVE) == 0;
if (lock_free)
desired_state += LW_VAL_SHARED;
}
@@ -620,7 +624,7 @@ LWLockAttemptLock(LWLock *lock, LWLockMode mode)
* Retry if the value changed since we last looked at it.
*/
if (pg_atomic_compare_exchange_u32(&lock->state,
- &expected_state, desired_state))
+ &old_state, desired_state))
{
if (lock_free)
{
--
2.4.0.rc2.1.g3d6bc9a
Andres Freund <andres@anarazel.de> writes:
I plan to commit the patch tomorrow, so it's included in alpha2.
Please try to commit anything you want in alpha2 *today*. I'd
prefer to see some successful buildfarm cycles on such patches
before we wrap.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-08-02 12:33:06 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I plan to commit the patch tomorrow, so it's included in alpha2.
Please try to commit anything you want in alpha2 *today*. I'd
prefer to see some successful buildfarm cycles on such patches
before we wrap.
Ok, will do that.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Aug 2, 2015 at 8:05 AM, Andres Freund <andres@anarazel.de> wrote:
On 2015-08-02 17:04:07 +0200, Andres Freund wrote:
I've attached a version of the patch that should address Heikki's
concern. It imo also improves the API and increases debuggability by not
having stale variable values in the variables anymore. (also attached is
a minor optimization that Heikki has noticed)
I've run these (from your git commit) for over 60 hours with no problem, so
I think it's solved.
If there are still problems, hopefully they will come out in other tests.
I don't know why the gin test was efficient at provoking the problem while
none of the other ones (btree-upsert, gist, brin, btree-foreign key,
btree-prepare transaction) I've played around with. Perhaps it is just due
to the amount of WAL which gin generates.
Cheers,
Jeff
On 2015-08-05 11:22:55 -0700, Jeff Janes wrote:
On Sun, Aug 2, 2015 at 8:05 AM, Andres Freund <andres@anarazel.de> wrote:
On 2015-08-02 17:04:07 +0200, Andres Freund wrote:
I've attached a version of the patch that should address Heikki's
concern. It imo also improves the API and increases debuggability by not
having stale variable values in the variables anymore. (also attached is
a minor optimization that Heikki has noticed)I've run these (from your git commit) for over 60 hours with no problem, so
I think it's solved.
Cool! Thanks for the testing.
If there are still problems, hopefully they will come out in other tests.
I don't know why the gin test was efficient at provoking the problem while
none of the other ones (btree-upsert, gist, brin, btree-foreign key,
btree-prepare transaction) I've played around with. Perhaps it is just due
to the amount of WAL which gin generates.
I'm not sure either... I've been able to reproduce a version of the
issue by judiciously sprinkling pg_usleep()'s over the code.
Regards,
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers