LWLock deadlock and gdb advice

Started by Jeff Janesalmost 11 years ago41 messageshackers

jeff.janes@gmail.com

almost 11 years ago

I have a 9.5alpha1 cluster which is locked up. All the user back ends seem
to be waiting on semop, eventually on WALInsertLockAcquire.

Is there a way to use gdb to figure out who holds the lock they are waiting
for?

It is compiled with both debug and cassert.

I am hoping someone can give me recipe similar to the one for Examining
backend memory use in https://wiki.postgresql.org/wiki/Developer_FAQ

example backtrace:

#0 0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1 0x000000000067190f in PGSemaphoreLock (sema=0x7f28a98b9468) at
pg_sema.c:387
#2 0x00000000006d4b0c in LWLockAcquireCommon (l=0x7f28a0e6d600,
valptr=0x7f28a0e6d618, val=0) at lwlock.c:1042
#3 LWLockAcquireWithVar (l=0x7f28a0e6d600, valptr=0x7f28a0e6d618, val=0)
at lwlock.c:916
#4 0x00000000004f3c4f in WALInsertLockAcquire (rdata=0xc5c130, fpw_lsn=0)
at xlog.c:1411
#5 XLogInsertRecord (rdata=0xc5c130, fpw_lsn=0) at xlog.c:948
#6 0x00000000004f7aac in XLogInsert (rmid=13 '\r', info=32 ' ') at
xloginsert.c:453
#7 0x000000000047e0b0 in ginPlaceToPage (btree=0x7fffca9263e0,
stack=0x2c94ff8, insertdata=<value optimized out>, updateblkno=<value
optimized out>, childbuf=0, buildStats=0x0)
at ginbtree.c:418
#8 0x000000000047f3ad in ginInsertValue (btree=0x7fffca9263e0,
stack=0x2c94ff8, insertdata=0x7fffca926460, buildStats=0x0) at
ginbtree.c:748
#9 0x0000000000475c8b in ginEntryInsert (ginstate=0x7fffca9267e0,
attnum=29784, key=1, category=<value optimized out>, items=0x7f28a0c7b458,
nitem=47, buildStats=0x0)
at gininsert.c:234
#10 0x0000000000485ecc in ginInsertCleanup (ginstate=0x7fffca9267e0,
vac_delay=<value optimized out>, stats=0x0) at ginfast.c:843
#11 0x0000000000487059 in ginHeapTupleFastInsert (ginstate=0x7fffca9267e0,
collector=<value optimized out>) at ginfast.c:436
#12 0x00000000004760fa in gininsert (fcinfo=<value optimized out>) at
gininsert.c:531

Cheers,

Jeff

Peter Geoghegan

pg@bowt.ie

almost 11 years ago

In reply to: Jeff Janes (#1)

Re: LWLock deadlock and gdb advice

On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Is there a way to use gdb to figure out who holds the lock they are waiting
for?

Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.

Otherwise, I suggest dereferencing the "l" argument to
LWLockAcquireWithVar() or something -- set the frame to "3" in your
example backtrace, using "f 3". Then, "p *l". You'll get the tranche
id there, and so on.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

almost 11 years ago

In reply to: Peter Geoghegan (#2)

Re: LWLock deadlock and gdb advice

On Tue, Jun 30, 2015 at 6:25 AM, Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Is there a way to use gdb to figure out who holds the lock they are

waiting

for?

Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.

If you define LOCK_DEBUG, then you can check owner of the
lock [1]#ifdef LOCK_DEBUG struct PGPROC *owner; /* last exlusive owner of the lock */ #endif } LWLock;, which will tell you about the Exclusive owner of that lock
and can help you in debugging the problem.

[1]: #ifdef LOCK_DEBUG struct PGPROC *owner; /* last exlusive owner of the lock */ #endif } LWLock;
#ifdef LOCK_DEBUG
struct PGPROC *owner; /* last exlusive owner of the lock */
#endif
} LWLock;

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Jeff Janes

jeff.janes@gmail.com

almost 11 years ago

In reply to: Peter Geoghegan (#2)

Re: LWLock deadlock and gdb advice

On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Is there a way to use gdb to figure out who holds the lock they are

waiting

for?

Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.

I hadn't planned on running into this problem, so had not compiled
accordingly.

I thought LOCK_DEBUG would produce too much output, but now I see it
doesn't print anything unless Trace_lwlocks is on (but it still makes more
info available to gdb, as Amit mentioned), so I think that should be OK.

Since I messed up the gdb session causing the postmaster to SIGKILL all the
children and destroy the evidence, I'll repeat the run compiled with
LOCK_DEBUG and see what it looks like in the morning.

Thanks,

Jeff

Jeff Janes

jeff.janes@gmail.com

almost 11 years ago

In reply to: Jeff Janes (#4)

Re: LWLock deadlock and gdb advice

On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Is there a way to use gdb to figure out who holds the lock they are

waiting

for?

Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.

I hadn't planned on running into this problem, so had not compiled
accordingly.

I thought LOCK_DEBUG would produce too much output, but now I see it
doesn't print anything unless Trace_lwlocks is on (but it still makes more
info available to gdb, as Amit mentioned), so I think that should be OK.

Since I messed up the gdb session causing the postmaster to SIGKILL all
the children and destroy the evidence, I'll repeat the run compiled with
LOCK_DEBUG and see what it looks like in the morning.

I've gotten the LWLock deadlock again. User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)

The Checkpoint process, meanwhile, has acquired WALInsertLocks 0 through 6
and is blocked on 7.

I'm not sure where to go from here.

24841:

(gdb) f 2

(gdb) p held_lwlocks
$1 = {{lock = 0x7f70e11fae40, mode = LW_EXCLUSIVE}, {lock = 0x7f70e13df080,
mode = LW_EXCLUSIVE}, {lock = 0x7f70e11d4280, mode = LW_EXCLUSIVE}, {lock =
0x7f70e11d4280,
mode = LW_EXCLUSIVE}, {lock = 0x0, mode = LW_EXCLUSIVE} <repeats 196
times>}
(gdb) p T_NAME((LWLock *) held_lwlocks[1])
$2 = 0x84cee1 "WALInsertLocks"
(gdb) p T_ID((LWLock *) held_lwlocks[1])

(gdb) p lock
$3 = (LWLock *) 0x7f70e13df000
(gdb) p T_NAME((LWLock *) lock)
$7 = 0x84cee1 "WALInsertLocks"
(gdb) p T_ID((LWLock *) lock)
$8 = 6

(gdb) p *(lock)
$1 = {mutex = 0 '\000', tranche = 1, state = {value = 1627389952}, nwaiters
= {value = 2}, waiters = {head = {prev = 0x7f70e9e29e58, next =
0x7f70e9e2a140}},
owner = 0x7f70e9e2d260}

(gdb) p *(lock->owner)
$5 = {links = {prev = 0x0, next = 0x0}, sem = {semId = 539754537, semNum =
0}, waitStatus = 0, procLatch = {is_set = 0, is_shared = 1 '\001',
owner_pid = 24820}, lxid = 0,
pid = 24820, pgprocno = 112, backendId = -1, databaseId = 0, roleId = 0,
recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode = 0
'\000', lwWaitLink = {
prev = 0x7f70e13df090, next = 0x7f70e13df090}, waitLock = 0x0,
waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0,
syncRepState = 0, syncRepLinks = {prev = 0x0,
next = 0x0}, myProcLocks = {{prev = 0x7f70e9e2d2f0, next =
0x7f70e9e2d2f0}, {prev = 0x7f70e9e2d300, next = 0x7f70e9e2d300}, {prev =
0x7f70e9e2d310, next = 0x7f70e9e2d310}, {
prev = 0x7f70e9e2d320, next = 0x7f70e9e2d320}, {prev =
0x7f70e9e2d330, next = 0x7f70e9e2d330}, {prev = 0x7f70e9e2d340, next =
0x7f70e9e2d340}, {prev = 0x7f70e9e2d350,
next = 0x7f70e9e2d350}, {prev = 0x7f70e9e2d360, next =
0x7f70e9e2d360}, {prev = 0x7f70e9e2d370, next = 0x7f70e9e2d370}, {prev =
0x7f70e9e2d380, next = 0x7f70e9e2d380}, {
prev = 0x7f70e9e2d390, next = 0x7f70e9e2d390}, {prev =
0x7f70e9e2d3a0, next = 0x7f70e9e2d3a0}, {prev = 0x7f70e9e2d3b0, next =
0x7f70e9e2d3b0}, {prev = 0x7f70e9e2d3c0,
next = 0x7f70e9e2d3c0}, {prev = 0x7f70e9e2d3d0, next =
0x7f70e9e2d3d0}, {prev = 0x7f70e9e2d3e0, next = 0x7f70e9e2d3e0}}, subxids =
{xids = {0 <repeats 64 times>}},
backendLock = 0x7f70e13dad00, fpLockBits = 0, fpRelId = {0 <repeats 16
times>}, fpVXIDLock = 0 '\000', fpLocalTransactionId = 0}

#0 0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1 0x0000000000671aef in PGSemaphoreLock (sema=0x7f70e9e2a108) at
pg_sema.c:387
#2 0x00000000006d79ac in LWLockWaitForVar (lock=0x7f70e13df000,
valptr=0x7f70e13df028, oldval=0, newval=0x7fffb33cbd48) at lwlock.c:1406
#3 0x00000000004ee006 in WaitXLogInsertionsToFinish (upto=41590734848) at
xlog.c:1576
#4 0x00000000004ee94b in AdvanceXLInsertBuffer (upto=41594920960,
opportunistic=<value optimized out>) at xlog.c:1888
#5 0x00000000004f3c42 in GetXLogBuffer (ptr=41594920960) at xlog.c:1669
#6 0x00000000004f40e7 in CopyXLogRecordToWAL (rdata=0x19bc920,
fpw_lsn=<value optimized out>) at xlog.c:1313
#7 XLogInsertRecord (rdata=0x19bc920, fpw_lsn=<value optimized out>) at
xlog.c:1008
#8 0x00000000004f7c6c in XLogInsert (rmid=13 '\r', info=32 ' ') at
xloginsert.c:453
#9 0x000000000047e210 in ginPlaceToPage (btree=0x7fffb33cc070,
stack=0x1a5bfe0, insertdata=<value optimized out>, updateblkno=<value
optimized out>, childbuf=0, buildStats=0x0)
at ginbtree.c:418
#10 0x000000000047f50d in ginInsertValue (btree=0x7fffb33cc070,
stack=0x1a5bfe0, insertdata=0x7fffb33cc0f0, buildStats=0x0) at
ginbtree.c:748
#11 0x0000000000475deb in ginEntryInsert (ginstate=0x7fffb33cc470,
attnum=5544, key=1, category=<value optimized out>, items=0x7f70e10d9140,
nitem=51, buildStats=0x0)
at gininsert.c:234
#12 0x000000000048602c in ginInsertCleanup (ginstate=0x7fffb33cc470,
vac_delay=<value optimized out>, stats=0x0) at ginfast.c:843
#13 0x00000000004871b9 in ginHeapTupleFastInsert (ginstate=0x7fffb33cc470,
collector=<value optimized out>) at ginfast.c:436
#14 0x000000000047625a in gininsert (fcinfo=<value optimized out>) at
gininsert.c:531
#15 0x00000000007dc483 in FunctionCall6Coll (flinfo=<value optimized out>,
collation=<value optimized out>, arg1=<value optimized out>, arg2=<value
optimized out>,
arg3=<value optimized out>, arg4=<value optimized out>,
arg5=140122789534360, arg6=0) at fmgr.c:1436
#16 0x00000000004b2247 in index_insert (indexRelation=0x7f70e1190e60,
values=0x7fffb33cef50, isnull=0x7fffb33cf050 "", heap_t_ctid=0x1a4a794,
heapRelation=0x7f70e1185a98,
checkUnique=UNIQUE_CHECK_NO) at indexam.c:226
#17 0x00000000005d2e67 in ExecInsertIndexTuples (slot=0x1a497f0,
tupleid=0x1a4a794, estate=0x1a40768, noDupErr=0 '\000', specConflict=0x0,
arbiterIndexes=0x0)
at execIndexing.c:384
#18 0x00000000005f0161 in ExecUpdate (tupleid=0x7fffb33cf250, oldtuple=0x0,
slot=0x1a497f0, planSlot=0x1a42498, epqstate=0x1a40a70, estate=0x1a40768,
canSetTag=1 '\001')
at nodeModifyTable.c:978
#19 0x00000000005f0b2a in ExecModifyTable (node=0x1a409d0) at
nodeModifyTable.c:1436
#20 0x00000000005d6cc8 in ExecProcNode (node=0x1a409d0) at
execProcnode.c:389
#21 0x00000000005d5402 in ExecutePlan (queryDesc=0x1a2b220,
direction=<value optimized out>, count=0) at execMain.c:1549
#22 standard_ExecutorRun (queryDesc=0x1a2b220, direction=<value optimized
out>, count=0) at execMain.c:337
#23 0x00007f70ea39af3b in pgss_ExecutorRun (queryDesc=0x1a2b220,
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:881
#24 0x00000000006ea38f in ProcessQuery (plan=0x1a2b3d8,
sourceText=0x1a2af68 "update foo set count=count+1 where text_array @> $1",
params=0x1a2afe8, dest=<value optimized out>,
completionTag=0x7fffb33cf660 "") at pquery.c:185
#25 0x00000000006ea5ec in PortalRunMulti (portal=0x19a6128, isTopLevel=1
'\001', dest=0xc55020, altdest=0xc55020, completionTag=0x7fffb33cf660 "")
at pquery.c:1279
#26 0x00000000006eadb3 in PortalRun (portal=0x19a6128,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x1990630,
altdest=0x1990630, completionTag=0x7fffb33cf660 "")
at pquery.c:816
#27 0x00000000006e6ffb in exec_execute_message (portal_name=0x1990218 "",
max_rows=9223372036854775807) at postgres.c:1988
#28 0x00000000006e8c15 in PostgresMain (argc=<value optimized out>,
argv=<value optimized out>, dbname=0x19a3210 "jjanes", username=<value
optimized out>) at postgres.c:4088
#29 0x00000000006855dd in BackendRun (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:4159
#30 BackendStartup (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:3835
#31 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at
postmaster.c:1609
#32 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1254
#33 0x0000000000610ab0 in main (argc=4, argv=0x1976cf0) at main.c:221

24820:

(gdb) p held_lwlocks
$1 = {{lock = 0x7f70e11d4340, mode = LW_EXCLUSIVE}, {lock = 0x7f70e13ded00,
mode = LW_EXCLUSIVE}, {lock = 0x7f70e13ded80, mode = LW_EXCLUSIVE}, {lock =
0x7f70e13dee00,
mode = LW_EXCLUSIVE}, {lock = 0x7f70e13dee80, mode = LW_EXCLUSIVE},
{lock = 0x7f70e13def00, mode = LW_EXCLUSIVE}, {lock = 0x7f70e13def80, mode
= LW_EXCLUSIVE}, {
lock = 0x7f70e13df000, mode = LW_EXCLUSIVE}, {lock = 0x0, mode =
LW_EXCLUSIVE} <repeats 192 times>}

(gdb) p lock
$3 = (LWLock *) 0x7f70e13df080

(gdb) p *lock
$4 = {mutex = 0 '\000', tranche = 1, state = {value = 1627389952}, nwaiters
= {value = 1}, waiters = {head = {prev = 0x7f70e9e2d2a8, next =
0x7f70e9e2d2a8}},
owner = 0x7f70e9e2a0f8}

(gdb) p *(lock->owner)
$5 = {links = {prev = 0x0, next = 0x0}, sem = {semId = 539688999, semNum =
15}, waitStatus = 0, procLatch = {is_set = 1, is_shared = 1 '\001',
owner_pid = 24841}, lxid = 5523,
pid = 24841, pgprocno = 95, backendId = 9, databaseId = 16384, roleId =
10, recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode =
2 '\002', lwWaitLink = {
prev = 0x7f70e13df010, next = 0x7f70e9e29e58}, waitLock = 0x0,
waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0,
syncRepState = 0, syncRepLinks = {prev = 0x0,
next = 0x0}, myProcLocks = {{prev = 0x7f70e9e2a188, next =
0x7f70e9e2a188}, {prev = 0x7f70e9e2a198, next = 0x7f70e9e2a198}, {prev =
0x7f70e9e2a1a8, next = 0x7f70e9e2a1a8}, {
prev = 0x7f70e9bfe060, next = 0x7f70e9bfe060}, {prev =
0x7f70e9e2a1c8, next = 0x7f70e9e2a1c8}, {prev = 0x7f70e9e2a1d8, next =
0x7f70e9e2a1d8}, {prev = 0x7f70e9e2a1e8,
next = 0x7f70e9e2a1e8}, {prev = 0x7f70e9e2a1f8, next =
0x7f70e9e2a1f8}, {prev = 0x7f70e9e2a208, next = 0x7f70e9e2a208}, {prev =
0x7f70e9e2a218, next = 0x7f70e9e2a218}, {
prev = 0x7f70e9e2a228, next = 0x7f70e9e2a228}, {prev =
0x7f70e9e2a238, next = 0x7f70e9e2a238}, {prev = 0x7f70e9e2a248, next =
0x7f70e9e2a248}, {prev = 0x7f70e9e2a258,
next = 0x7f70e9e2a258}, {prev = 0x7f70e9e2a268, next =
0x7f70e9e2a268}, {prev = 0x7f70e9e2a278, next = 0x7f70e9e2a278}}, subxids =
{xids = {0 <repeats 64 times>}},
backendLock = 0x7f70e13da8c0, fpLockBits = 160528697655296, fpRelId = {0,
0, 0, 0, 0, 0, 0, 0, 0, 2679, 2610, 16414, 16412, 16416, 16415, 16409},
fpVXIDLock = 1 '\001',
fpLocalTransactionId = 5523}

(gdb) bt
#0 0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1 0x0000000000671aef in PGSemaphoreLock (sema=0x7f70e9e2d270) at
pg_sema.c:387
#2 0x00000000006d70ac in LWLockAcquireCommon (lock=0x7f70e13df080,
mode=LW_EXCLUSIVE, valptr=0x7f70e13df0a8, val=0) at lwlock.c:1042
#3 0x00000000004e6dab in WALInsertLockAcquireExclusive () at xlog.c:1449
#4 0x00000000004ef025 in CreateCheckPoint (flags=64) at xlog.c:8177
#5 0x00000000006789d3 in CheckpointerMain () at checkpointer.c:493
#6 0x00000000004fecec in AuxiliaryProcessMain (argc=2,
argv=0x7fffb33cf1d0) at bootstrap.c:428
#7 0x0000000000680697 in StartChildProcess (type=CheckpointerProcess) at
postmaster.c:5143
#8 0x0000000000682b7a in reaper (postgres_signal_arg=<value optimized
out>) at postmaster.c:2656
#9 <signal handler called>
#10 0x0000003dcb6e1353 in __select_nocancel () from /lib64/libc.so.6
#11 0x00000000006849bf in ServerLoop (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:1573
#12 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1254
#13 0x0000000000610ab0 in main (argc=4, argv=0x1976cf0) at main.c:221

Show quoted text

Thanks,

Jeff

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Jeff Janes (#5)

Re: LWLock deadlock and gdb advice

Jeff Janes wrote:

I've gotten the LWLock deadlock again. User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)

Hmm, interesting -- pg_stat_statement is trying to re-do an operation
that involves updating a GIN index, while WAL-logging for the original
update is still ongoing, it seems.

#0 0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1 0x0000000000671aef in PGSemaphoreLock (sema=0x7f70e9e2a108) at
pg_sema.c:387
#2 0x00000000006d79ac in LWLockWaitForVar (lock=0x7f70e13df000,
valptr=0x7f70e13df028, oldval=0, newval=0x7fffb33cbd48) at lwlock.c:1406
#3 0x00000000004ee006 in WaitXLogInsertionsToFinish (upto=41590734848) at
xlog.c:1576
#4 0x00000000004ee94b in AdvanceXLInsertBuffer (upto=41594920960,
opportunistic=<value optimized out>) at xlog.c:1888
#5 0x00000000004f3c42 in GetXLogBuffer (ptr=41594920960) at xlog.c:1669
#6 0x00000000004f40e7 in CopyXLogRecordToWAL (rdata=0x19bc920,
fpw_lsn=<value optimized out>) at xlog.c:1313
#7 XLogInsertRecord (rdata=0x19bc920, fpw_lsn=<value optimized out>) at
xlog.c:1008
#8 0x00000000004f7c6c in XLogInsert (rmid=13 '\r', info=32 ' ') at
xloginsert.c:453
#9 0x000000000047e210 in ginPlaceToPage (btree=0x7fffb33cc070,
stack=0x1a5bfe0, insertdata=<value optimized out>, updateblkno=<value
optimized out>, childbuf=0, buildStats=0x0)
at ginbtree.c:418
#10 0x000000000047f50d in ginInsertValue (btree=0x7fffb33cc070,
stack=0x1a5bfe0, insertdata=0x7fffb33cc0f0, buildStats=0x0) at
ginbtree.c:748
#11 0x0000000000475deb in ginEntryInsert (ginstate=0x7fffb33cc470,
attnum=5544, key=1, category=<value optimized out>, items=0x7f70e10d9140,
nitem=51, buildStats=0x0)
at gininsert.c:234
#12 0x000000000048602c in ginInsertCleanup (ginstate=0x7fffb33cc470,
vac_delay=<value optimized out>, stats=0x0) at ginfast.c:843
#13 0x00000000004871b9 in ginHeapTupleFastInsert (ginstate=0x7fffb33cc470,
collector=<value optimized out>) at ginfast.c:436
#14 0x000000000047625a in gininsert (fcinfo=<value optimized out>) at
gininsert.c:531
#15 0x00000000007dc483 in FunctionCall6Coll (flinfo=<value optimized out>,
collation=<value optimized out>, arg1=<value optimized out>, arg2=<value
optimized out>,
arg3=<value optimized out>, arg4=<value optimized out>,
arg5=140122789534360, arg6=0) at fmgr.c:1436
#16 0x00000000004b2247 in index_insert (indexRelation=0x7f70e1190e60,
values=0x7fffb33cef50, isnull=0x7fffb33cf050 "", heap_t_ctid=0x1a4a794,
heapRelation=0x7f70e1185a98,
checkUnique=UNIQUE_CHECK_NO) at indexam.c:226
#17 0x00000000005d2e67 in ExecInsertIndexTuples (slot=0x1a497f0,
tupleid=0x1a4a794, estate=0x1a40768, noDupErr=0 '\000', specConflict=0x0,
arbiterIndexes=0x0)
at execIndexing.c:384
#18 0x00000000005f0161 in ExecUpdate (tupleid=0x7fffb33cf250, oldtuple=0x0,
slot=0x1a497f0, planSlot=0x1a42498, epqstate=0x1a40a70, estate=0x1a40768,
canSetTag=1 '\001')
at nodeModifyTable.c:978
#19 0x00000000005f0b2a in ExecModifyTable (node=0x1a409d0) at
nodeModifyTable.c:1436
#20 0x00000000005d6cc8 in ExecProcNode (node=0x1a409d0) at
execProcnode.c:389
#21 0x00000000005d5402 in ExecutePlan (queryDesc=0x1a2b220,
direction=<value optimized out>, count=0) at execMain.c:1549
#22 standard_ExecutorRun (queryDesc=0x1a2b220, direction=<value optimized
out>, count=0) at execMain.c:337
#23 0x00007f70ea39af3b in pgss_ExecutorRun (queryDesc=0x1a2b220,
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:881
#24 0x00000000006ea38f in ProcessQuery (plan=0x1a2b3d8,
sourceText=0x1a2af68 "update foo set count=count+1 where text_array @> $1",
params=0x1a2afe8, dest=<value optimized out>,
completionTag=0x7fffb33cf660 "") at pquery.c:185
#25 0x00000000006ea5ec in PortalRunMulti (portal=0x19a6128, isTopLevel=1
'\001', dest=0xc55020, altdest=0xc55020, completionTag=0x7fffb33cf660 "")
at pquery.c:1279
#26 0x00000000006eadb3 in PortalRun (portal=0x19a6128,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x1990630,
altdest=0x1990630, completionTag=0x7fffb33cf660 "")
at pquery.c:816
#27 0x00000000006e6ffb in exec_execute_message (portal_name=0x1990218 "",
max_rows=9223372036854775807) at postgres.c:1988
#28 0x00000000006e8c15 in PostgresMain (argc=<value optimized out>,
argv=<value optimized out>, dbname=0x19a3210 "jjanes", username=<value
optimized out>) at postgres.c:4088
#29 0x00000000006855dd in BackendRun (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:4159
#30 BackendStartup (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:3835
#31 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at
postmaster.c:1609
#32 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1254
#33 0x0000000000610ab0 in main (argc=4, argv=0x1976cf0) at main.c:221

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 11 years ago

In reply to: Jeff Janes (#5)

Re: LWLock deadlock and gdb advice

On 06/30/2015 07:05 PM, Jeff Janes wrote:

On Mon, Jun 29, 2015 at 11:28 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Mon, Jun 29, 2015 at 5:55 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Jun 29, 2015 at 5:37 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Is there a way to use gdb to figure out who holds the lock they are

waiting

for?

Have you considered building with LWLOCK_STATS defined, and LOCK_DEBUG
defined? That might do it.

I hadn't planned on running into this problem, so had not compiled
accordingly.

I thought LOCK_DEBUG would produce too much output, but now I see it
doesn't print anything unless Trace_lwlocks is on (but it still makes more
info available to gdb, as Amit mentioned), so I think that should be OK.

Since I messed up the gdb session causing the postmaster to SIGKILL all
the children and destroy the evidence, I'll repeat the run compiled with
LOCK_DEBUG and see what it looks like in the morning.

I've gotten the LWLock deadlock again. User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)

The Checkpoint process, meanwhile, has acquired WALInsertLocks 0 through 6
and is blocked on 7.

I'm not sure where to go from here.

The user backend 24841 is waiting in a LWLockWaitForVar() call, on
WALInsertLock 6, and oldval==0. The checkpointer is holding
WALInsertLock 6, but it has set the insertingat value to PG_UINT64_MAX.
The LWLockWaitForVar() call should not be blocked on that, because 0 !=
PG_UINT64_MAX.

Looks like the LWLock-scalability patch that made LWlock acquisiton to
use atomic ops instead of a spinlock broke this. LWLockWaitForVar() is
supposed to guarantee that it always wakes up from sleep, when the
variable's value changes. But there is now a race condition in
LWLockWaitForVar that wasn't there in 9.4:

if (mustwait)
{
/*
* Perform comparison using spinlock as we can't rely on atomic 64
* bit reads/stores.
*/
#ifdef LWLOCK_STATS
lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
#else
SpinLockAcquire(&lock->mutex);
#endif

/*
* XXX: We can significantly optimize this on platforms with 64bit
* atomics.
*/
value = *valptr;
if (value != oldval)
{
result = false;
mustwait = false;
*newval = value;
}
else
mustwait = true;
SpinLockRelease(&lock->mutex);
}
else
mustwait = false;

if (!mustwait)
break; /* the lock was free or value didn't match */

/*
* Add myself to wait queue. Note that this is racy, somebody else
* could wakeup before we're finished queuing. NB: We're using nearly
* the same twice-in-a-row lock acquisition protocol as
* LWLockAcquire(). Check its comments for details.
*/
LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false);

After the spinlock is released above, but before the LWLockQueueSelf()
call, it's possible that another backend comes in, acquires the lock,
changes the variable's value, and releases the lock again. In 9.4, the
spinlock was not released until the process was queued.

Looking at LWLockAcquireWithVar(), it's also no longer holding the
spinlock while it updates the Var. That's not cool either :-(.
- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 11 years ago

In reply to: Alvaro Herrera (#6)

Re: LWLock deadlock and gdb advice

On 06/30/2015 07:37 PM, Alvaro Herrera wrote:

Jeff Janes wrote:

I've gotten the LWLock deadlock again. User backend 24841 holds the
WALInsertLocks 7 and is blocked attempting to acquire 6 . So it seems to
be violating the lock ordering rules (although I don't see that rule
spelled out in xlog.c)

Hmm, interesting -- pg_stat_statement is trying to re-do an operation
that involves updating a GIN index, while WAL-logging for the original
update is still ongoing, it seems.

I don't think pg_stat_statement has anything to do with this. You can
see from the backtrace that pg_stat_statement is enabled, as the call
went through the pgss_ExecutorRun executor hook, but that's all.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

almost 11 years ago

In reply to: Heikki Linnakangas (#7)

Re: LWLock deadlock and gdb advice

On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote:

/*
* XXX: We can significantly optimize this on platforms with 64bit
* atomics.
*/
value = *valptr;
if (value != oldval)
{
result = false;
mustwait = false;
*newval = value;
}
else
mustwait = true;
SpinLockRelease(&lock->mutex);
}
else
mustwait = false;

if (!mustwait)
break; /* the lock was free or value didn't match */

/*
* Add myself to wait queue. Note that this is racy, somebody else
* could wakeup before we're finished queuing. NB: We're using nearly
* the same twice-in-a-row lock acquisition protocol as
* LWLockAcquire(). Check its comments for details.
*/
LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false);

After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.

Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.

Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock
while it updates the Var. That's not cool either :-(.

Hm. I'd somehow assumed holding the lwlock ought to be sufficient, but
it really isn't. This var stuff isn't fitting all that well into the
lwlock code.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 11 years ago

In reply to: Andres Freund (#9)

Re: LWLock deadlock and gdb advice

On 06/30/2015 10:09 PM, Andres Freund wrote:

On 2015-06-30 21:08:53 +0300, Heikki Linnakangas wrote:

/*
* XXX: We can significantly optimize this on platforms with 64bit
* atomics.
*/
value = *valptr;
if (value != oldval)
{
result = false;
mustwait = false;
*newval = value;
}
else
mustwait = true;
SpinLockRelease(&lock->mutex);
}
else
mustwait = false;

if (!mustwait)
break; /* the lock was free or value didn't match */

/*
* Add myself to wait queue. Note that this is racy, somebody else
* could wakeup before we're finished queuing. NB: We're using nearly
* the same twice-in-a-row lock acquisition protocol as
* LWLockAcquire(). Check its comments for details.
*/
LWLockQueueSelf(lock, LW_WAIT_UNTIL_FREE, false);

After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.

Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.

Yeah. It's probably more efficient to not release the spinlock between
checking the value and LWLockQueueSelf() though.

Looking at LWLockAcquireWithVar(), it's also no longer holding the spinlock
while it updates the Var. That's not cool either :-(.

Hm. I'd somehow assumed holding the lwlock ought to be sufficient, but
it really isn't. This var stuff isn't fitting all that well into the
lwlock code.

I'll take a stab at fixing this tomorrow. I take the blame for not
documenting the semantics of LWLockAcquireWithVar() and friends well
enough...

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Andres Freund

andres@anarazel.de

almost 11 years ago

In reply to: Heikki Linnakangas (#10)

Re: LWLock deadlock and gdb advice

On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:

Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.

Yeah. It's probably more efficient to not release the spinlock between
checking the value and LWLockQueueSelf() though.

Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
way in a bunch of callsites... So that'd be harder. Additionally I'm
planning to get rid of the spinlocks around queuing (they show up as
bottlenecks in contended workloads), so it seems more future proof not
to assume that either way. On top of that I think we should, when
available (or using the same type of fallback as for 32bit?), use 64 bit
atomics for the var anyway.

I'll take a stab at fixing this tomorrow.

Thanks! Do you mean both or "just" the second issue?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 11 years ago

In reply to: Andres Freund (#11)

Re: LWLock deadlock and gdb advice

On 06/30/2015 11:24 PM, Andres Freund wrote:

On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:

Hm. Right. A recheck of the value after the queuing should be sufficient
to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.

Yeah. It's probably more efficient to not release the spinlock between
checking the value and LWLockQueueSelf() though.

Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
way in a bunch of callsites... So that'd be harder. Additionally I'm
planning to get rid of the spinlocks around queuing (they show up as
bottlenecks in contended workloads), so it seems more future proof not
to assume that either way. On top of that I think we should, when
available (or using the same type of fallback as for 32bit?), use 64 bit
atomics for the var anyway.

I'll take a stab at fixing this tomorrow.

Thanks! Do you mean both or "just" the second issue?

Both. Here's the patch.

Previously, LWLockAcquireWithVar set the variable associated with the
lock atomically with acquiring it. Before the lwlock-scalability
changes, that was straightforward because you held the spinlock anyway,
but it's a lot harder/expensive now. So I changed the way acquiring a
lock with a variable works. There is now a separate flag,
LW_FLAG_VAR_SET, which indicates that the current lock holder has
updated the variable. The LWLockAcquireWithVar function is gone - you
now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET
flag, and you can call LWLockUpdateVar() after that if you want to set
the variable immediately. LWLockWaitForVar() always waits if the flag is
not set, i.e. it will not return regardless of the variable's value, if
the current lock-holder has not updated it yet.

This passes make check, but I haven't done any testing beyond that. Does
this look sane to you?

- Heikki

#13

Jeff Janes

jeff.janes@gmail.com

almost 11 years ago

In reply to: Heikki Linnakangas (#12)

Re: LWLock deadlock and gdb advice

On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Both. Here's the patch.

Previously, LWLockAcquireWithVar set the variable associated with the lock
atomically with acquiring it. Before the lwlock-scalability changes, that
was straightforward because you held the spinlock anyway, but it's a lot
harder/expensive now. So I changed the way acquiring a lock with a variable
works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
the current lock holder has updated the variable. The LWLockAcquireWithVar
function is gone - you now just use LWLockAcquire(), which always clears
the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
you want to set the variable immediately. LWLockWaitForVar() always waits
if the flag is not set, i.e. it will not return regardless of the
variable's value, if the current lock-holder has not updated it yet.

I ran this for a while without casserts and it seems to work. But with
casserts, I get failures in the autovac process on the GIN index.

I don't see how this is related to the LWLock issue, but I didn't see it
without your patch. Perhaps the system just didn't survive long enough to
uncover it without the patch (although it shows up pretty quickly). It
could just be an overzealous Assert, since the casserts off didn't show
problems.

bt and bt full are shown below.

Cheers,

Jeff

#0 0x0000003dcb632625 in raise () from /lib64/libc.so.6
#1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6
#2 0x0000000000930b7a in ExceptionalCondition (
conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >=
(__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc
"FailedAssertion",
fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54
#3 0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
ginvacuum.c:713
#4 0x000000000093b53a in FunctionCall2Coll (flinfo=0x7fffee073e60,
collation=0, arg1=140737186840448, arg2=0) at fmgr.c:1323
#5 0x00000000004d5c7a in index_vacuum_cleanup (info=0x7fffee073f80,
stats=0x0) at indexam.c:717
#6 0x0000000000664f69 in lazy_cleanup_index (indrel=0x7faafbcace20,
stats=0x0, vacrelstats=0x28809c8) at vacuumlazy.c:1400
#7 0x0000000000664142 in lazy_scan_heap (onerel=0x7faafbcab6d0,
vacrelstats=0x28809c8, Irel=0x2881090, nindexes=2, scan_all=1 '\001') at
vacuumlazy.c:1111
#8 0x0000000000662905 in lazy_vacuum_rel (onerel=0x7faafbcab6d0,
options=65, params=0x286ea00, bstrategy=0x286ea90) at vacuumlazy.c:247
#9 0x00000000006623e4 in vacuum_rel (relid=16409, relation=0x7fffee074550,
options=65, params=0x286ea00) at vacuum.c:1377
#10 0x0000000000660bea in vacuum (options=65, relation=0x7fffee074550,
relid=16409, params=0x286ea00, va_cols=0x0, bstrategy=0x286ea90,
isTopLevel=1 '\001')
at vacuum.c:295
#11 0x000000000075caa9 in autovacuum_do_vac_analyze (tab=0x286e9f8,
bstrategy=0x286ea90) at autovacuum.c:2811
#12 0x000000000075be67 in do_autovacuum () at autovacuum.c:2331
#13 0x000000000075ac1c in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1648
#14 0x000000000075a84d in StartAutoVacWorker () at autovacuum.c:1455
#15 0x000000000076f745 in StartAutovacuumWorker () at postmaster.c:5261
#16 0x000000000076effd in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:4918
#17 <signal handler called>
#18 0x0000003dcb6e1353 in __select_nocancel () from /lib64/libc.so.6
#19 0x000000000076a8f0 in ServerLoop () at postmaster.c:1582
#20 0x000000000076a141 in PostmasterMain (argc=4, argv=0x27b2cf0) at
postmaster.c:1263
#21 0x00000000006c1e6e in main (argc=4, argv=0x27b2cf0) at main.c:223

#0 0x0000003dcb632625 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x0000000000930b7a in ExceptionalCondition (
conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >=
(__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc
"FailedAssertion",
fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54
No locals.
#3 0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
ginvacuum.c:713
buffer = 186
page = 0x7faafc565b20 ""
info = 0x7fffee073f80
stats = 0x2880858
index = 0x7faafbcace20
needLock = 1 '\001'
npages = 22569
blkno = 12025
totFreePages = 11502
ginstate = {index = 0x7faafbcace20, oneCol = 1 '\001', origTupdesc
= 0x7faafbcad150, tupdesc = {0x7faafbcad150, 0x0 <repeats 31 times>},
compareFn = {{fn_addr = 0x90224d <bttextcmp>, fn_oid = 360,
fn_nargs = 2, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2
'\002',
fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} <repeats 31 times>}, extractValueFn = {{
fn_addr = 0x494adc <ginarrayextract>, fn_oid = 2743, fn_nargs
= 3, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2 '\002',
fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} <repeats 31 times>}, extractQueryFn = {{
fn_addr = 0x494beb <ginqueryarrayextract>, fn_oid = 2774,
fn_nargs = 7, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2
'\002',
fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} <repeats 31 times>}, consistentFn = {{
fn_addr = 0x494d67 <ginarrayconsistent>, fn_oid = 2744,
fn_nargs = 8, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2
'\002',
fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} <repeats 31 times>}, triConsistentFn = {{
fn_addr = 0x494efd <ginarraytriconsistent>, fn_oid = 3920,
fn_nargs = 7, fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2
'\002',
fn_extra = 0x0, fn_mcxt = 0x285e3c0, fn_expr = 0x0}, {fn_addr
= 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr =
0x0} <repeats 31 times>}, comparePartialFn = {{fn_addr = 0, fn_oid = 0,
fn_nargs = 0, fn_strict = 0 '\000', fn_retset = 0 '\000',
fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0,
fn_expr = 0x0} <repeats 32 times>}, canPartialMatch = '\000'
<repeats 31 times>, supportCollation = {100, 0 <repeats 31 times>}}
idxStat = {nPendingPages = 0, nTotalPages = 0, nEntryPages = 522,
nDataPages = 0, nEntries = 11000, ginVersion = 0}

#14

Jeff Janes

jeff.janes@gmail.com

almost 11 years ago

In reply to: Heikki Linnakangas (#12)

Re: LWLock deadlock and gdb advice

On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 06/30/2015 11:24 PM, Andres Freund wrote:

On 2015-06-30 22:19:02 +0300, Heikki Linnakangas wrote:

Hm. Right. A recheck of the value after the queuing should be sufficient

to fix? That's how we deal with the exact same scenarios for the normal
lock state, so that shouldn't be very hard to add.

Yeah. It's probably more efficient to not release the spinlock between
checking the value and LWLockQueueSelf() though.

Right now LWLockQueueSelf() takes spinlocks etc itself, and is used that
way in a bunch of callsites... So that'd be harder. Additionally I'm
planning to get rid of the spinlocks around queuing (they show up as
bottlenecks in contended workloads), so it seems more future proof not
to assume that either way. On top of that I think we should, when
available (or using the same type of fallback as for 32bit?), use 64 bit
atomics for the var anyway.

I'll take a stab at fixing this tomorrow.

Thanks! Do you mean both or "just" the second issue?

Both. Here's the patch.

Previously, LWLockAcquireWithVar set the variable associated with the lock
atomically with acquiring it. Before the lwlock-scalability changes, that
was straightforward because you held the spinlock anyway, but it's a lot
harder/expensive now. So I changed the way acquiring a lock with a variable
works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
the current lock holder has updated the variable. The LWLockAcquireWithVar
function is gone - you now just use LWLockAcquire(), which always clears
the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
you want to set the variable immediately. LWLockWaitForVar() always waits
if the flag is not set, i.e. it will not return regardless of the
variable's value, if the current lock-holder has not updated it yet.

This passes make check, but I haven't done any testing beyond that. Does
this look sane to you?

After applying this patch to commit fdf28853ae6a397497b79f, it has survived
testing long enough to convince that this fixes the problem.

Cheers,

Jeff

#15

Jeff Janes

jeff.janes@gmail.com

almost 11 years ago

In reply to: Jeff Janes (#13)

Re: LWLock deadlock and gdb advice

On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi>
wrote:

Both. Here's the patch.

Previously, LWLockAcquireWithVar set the variable associated with the
lock atomically with acquiring it. Before the lwlock-scalability changes,
that was straightforward because you held the spinlock anyway, but it's a
lot harder/expensive now. So I changed the way acquiring a lock with a
variable works. There is now a separate flag, LW_FLAG_VAR_SET, which
indicates that the current lock holder has updated the variable. The
LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(),
which always clears the LW_FLAG_VAR_SET flag, and you can call
LWLockUpdateVar() after that if you want to set the variable immediately.
LWLockWaitForVar() always waits if the flag is not set, i.e. it will not
return regardless of the variable's value, if the current lock-holder has
not updated it yet.

I ran this for a while without casserts and it seems to work. But with
casserts, I get failures in the autovac process on the GIN index.

I don't see how this is related to the LWLock issue, but I didn't see it
without your patch. Perhaps the system just didn't survive long enough to
uncover it without the patch (although it shows up pretty quickly). It
could just be an overzealous Assert, since the casserts off didn't show
problems.

bt and bt full are shown below.

Cheers,

Jeff

#0 0x0000003dcb632625 in raise () from /lib64/libc.so.6
#1 0x0000003dcb633e05 in abort () from /lib64/libc.so.6
#2 0x0000000000930b7a in ExceptionalCondition (
conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >=
(__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc
"FailedAssertion",
fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54
#3 0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at
ginvacuum.c:713

It now looks like this *is* unrelated to the LWLock issue. The assert that
it is tripping over was added just recently (302ac7f27197855afa8c) and so I
had not been testing under its presence until now. It looks like it is
finding all-zero pages (index extended but then a crash before initializing
the page?) and it doesn't like them.

(gdb) f 3
(gdb) p *(char[8192]*)(page)
$11 = '\000' <repeats 8191 times>

Presumably before this assert, such pages would just be permanently
orphaned.

Cheers,

Jeff

#16

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

almost 11 years ago

In reply to: Jeff Janes (#13)

Re: LWLock deadlock and gdb advice

On 2015-07-16 PM 04:03, Jeff Janes wrote:

On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Both. Here's the patch.

Previously, LWLockAcquireWithVar set the variable associated with the lock
atomically with acquiring it. Before the lwlock-scalability changes, that
was straightforward because you held the spinlock anyway, but it's a lot
harder/expensive now. So I changed the way acquiring a lock with a variable
works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
the current lock holder has updated the variable. The LWLockAcquireWithVar
function is gone - you now just use LWLockAcquire(), which always clears
the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
you want to set the variable immediately. LWLockWaitForVar() always waits
if the flag is not set, i.e. it will not return regardless of the
variable's value, if the current lock-holder has not updated it yet.

I ran this for a while without casserts and it seems to work. But with
casserts, I get failures in the autovac process on the GIN index.

I don't see how this is related to the LWLock issue, but I didn't see it
without your patch. Perhaps the system just didn't survive long enough to
uncover it without the patch (although it shows up pretty quickly). It
could just be an overzealous Assert, since the casserts off didn't show
problems.

bt and bt full are shown below.

I got a similar assert failure but with a btree index
(pg_attribute_relid_attnum_index). The backtrace looks like Jeff's:

(gdb) bt
#0 0x0000003969632625 in raise () from /lib64/libc.so.6
#1 0x0000003969633e05 in abort () from /lib64/libc.so.6
#2 0x000000000092eb9e in ExceptionalCondition (conditionName=0x9c2220
"!(((PageHeader) (page))->pd_special >= (__builtin_offsetof
(PageHeaderData, pd_linp)))",
errorType=0x9c0c41 "FailedAssertion", fileName=0x9c0c10 "nbtree.c",
lineNumber=903) at assert.c:54
#3 0x00000000004e02d8 in btvacuumpage (vstate=0x7fff2c7655f0, blkno=9,
orig_blkno=9) at nbtree.c:903
#4 0x00000000004e0067 in btvacuumscan (info=0x7fff2c765cd0,
stats=0x279f7d0, callback=0x668f6d <lazy_tid_reaped>,
callback_state=0x279e338, cycleid=49190)
at nbtree.c:821
#5 0x00000000004dfdde in btbulkdelete (fcinfo=0x7fff2c7657d0) at nbtree.c:676
#6 0x0000000000939769 in FunctionCall4Coll (flinfo=0x7fff2c765bb0,
collation=0, arg1=140733939342544, arg2=0, arg3=6721389, arg4=41542456) at
fmgr.c:1375
#7 0x00000000004d2a01 in index_bulk_delete (info=0x7fff2c765cd0,
stats=0x0, callback=0x668f6d <lazy_tid_reaped>, callback_state=0x279e338)
at indexam.c:690
#8 0x000000000066861d in lazy_vacuum_index (indrel=0x7fd40ab846f0,
stats=0x279e770, vacrelstats=0x279e338) at vacuumlazy.c:1367
#9 0x00000000006678a8 in lazy_scan_heap (onerel=0x274ec90,
vacrelstats=0x279e338, Irel=0x279e790, nindexes=2, scan_all=0 '\000') at
vacuumlazy.c:1098
#10 0x00000000006660f7 in lazy_vacuum_rel (onerel=0x274ec90, options=99,
params=0x27bdc88, bstrategy=0x27bdd18) at vacuumlazy.c:244
#11 0x0000000000665c1a in vacuum_rel (relid=1249, relation=0x7fff2c7662a0,
options=99, params=0x27bdc88) at vacuum.c:1388
#12 0x00000000006643ce in vacuum (options=99, relation=0x7fff2c7662a0,
relid=1249, params=0x27bdc88, va_cols=0x0, bstrategy=0x27bdd18,
isTopLevel=1 '\001')
at vacuum.c:293
#13 0x000000000075d23c in autovacuum_do_vac_analyze (tab=0x27bdc80,
bstrategy=0x27bdd18) at autovacuum.c:2807
#14 0x000000000075c632 in do_autovacuum () at autovacuum.c:2328
#15 0x000000000075b457 in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1647
#16 0x000000000075b0a5 in StartAutoVacWorker () at autovacuum.c:1454
#17 0x000000000076f9cc in StartAutovacuumWorker () at postmaster.c:5261
#18 0x000000000076f28a in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:4918
#19 <signal handler called>
#20 0x00000039696e1353 in __select_nocancel () from /lib64/libc.so.6
#21 0x000000000076ace7 in ServerLoop () at postmaster.c:1582
#22 0x000000000076a538 in PostmasterMain (argc=3, argv=0x26f9330) at
postmaster.c:1263
#23 0x00000000006c1c2e in main (argc=3, argv=0x26f9330) at main.c:223

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Andres Freund

andres@anarazel.de

almost 11 years ago

In reply to: Jeff Janes (#14)

Re: LWLock deadlock and gdb advice

Hi,

On 2015-07-19 11:49:14 -0700, Jeff Janes wrote:

After applying this patch to commit fdf28853ae6a397497b79f, it has survived
testing long enough to convince that this fixes the problem.

What was the actual workload breaking with the bug? I ran a small
variety and I couldn't reproduce it yet. I'm not saying there's no bug,
I just would like to be able to test my version of the fixes...

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Jeff Janes

jeff.janes@gmail.com

almost 11 years ago

In reply to: Andres Freund (#17)

Re: LWLock deadlock and gdb advice

On Tue, Jul 28, 2015 at 7:06 AM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2015-07-19 11:49:14 -0700, Jeff Janes wrote:

After applying this patch to commit fdf28853ae6a397497b79f, it has

survived

testing long enough to convince that this fixes the problem.

What was the actual workload breaking with the bug? I ran a small
variety and I couldn't reproduce it yet. I'm not saying there's no bug,
I just would like to be able to test my version of the fixes...

It was the torn-page fault-injection code here:

https://drive.google.com/open?id=0Bzqrh1SO9FcEfkxFb05uQnJ2cWg0MEpmOXlhbFdyNEItNmpuek1zU2gySGF3Vk1oYXNNLUE

It is not a minimal set, I don't know if all parts of this are necessary to
rerproduce it. The whole crash-recovery cycling might not even be
important.

Compiled with:

./configure --enable-debug --with-libxml --with-perl --with-python
--with-ldap --with-openssl --with-gssapi
--prefix=/home/jjanes/pgsql/torn_bisect/

(Also with or without --enable-cassert).

I just ran "sh do.sh >& do.out" and eventually it stopped producing output,
and I find everything hung up.

Cheers,

Jeff

#19

Andres Freund

andres@anarazel.de

almost 11 years ago

In reply to: Heikki Linnakangas (#12)

Re: LWLock deadlock and gdb advice

Hi,

Finally getting to this.

On 2015-07-15 18:44:03 +0300, Heikki Linnakangas wrote:

Previously, LWLockAcquireWithVar set the variable associated with the lock
atomically with acquiring it. Before the lwlock-scalability changes, that
was straightforward because you held the spinlock anyway, but it's a lot
harder/expensive now. So I changed the way acquiring a lock with a variable
works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
the current lock holder has updated the variable. The LWLockAcquireWithVar
function is gone - you now just use LWLockAcquire(), which always clears the
LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you
want to set the variable immediately.

This passes make check, but I haven't done any testing beyond that. Does
this look sane to you?

The prime thing I dislike about the patch is how long it now holds the
spinlock in WaitForVar. I don't understand why that's necessary? There's
no need to hold a spinlock until after the
mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
unless I miss something?

In an earlier email you say:

After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.

But that's not a problem. The updater in that case will have queued a
wakeup for all waiters, including WaitForVar()?

I'll try to reproduce the problem now. But I do wonder if it's possibly
just the missing spinlock during the update.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 11 years ago

In reply to: Andres Freund (#19)

Re: LWLock deadlock and gdb advice

On 07/29/2015 02:39 PM, Andres Freund wrote:

On 2015-07-15 18:44:03 +0300, Heikki Linnakangas wrote:

Previously, LWLockAcquireWithVar set the variable associated with the lock
atomically with acquiring it. Before the lwlock-scalability changes, that
was straightforward because you held the spinlock anyway, but it's a lot
harder/expensive now. So I changed the way acquiring a lock with a variable
works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
the current lock holder has updated the variable. The LWLockAcquireWithVar
function is gone - you now just use LWLockAcquire(), which always clears the
LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you
want to set the variable immediately.

This passes make check, but I haven't done any testing beyond that. Does
this look sane to you?

The prime thing I dislike about the patch is how long it now holds the
spinlock in WaitForVar. I don't understand why that's necessary? There's
no need to hold a spinlock until after the
mustwait = (pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE) != 0;
unless I miss something?

In an earlier email you say:

After the spinlock is released above, but before the LWLockQueueSelf() call,
it's possible that another backend comes in, acquires the lock, changes the
variable's value, and releases the lock again. In 9.4, the spinlock was not
released until the process was queued.

But that's not a problem. The updater in that case will have queued a
wakeup for all waiters, including WaitForVar()?

If you release the spinlock before LWLockQueueSelf(), then no. It's
possible that the backend you wanted to wait for updates the variable's
value before you've queued up. Or even releases the lock, and it gets
re-acquired by another backend, before you've queued up.

Or are you thinking of just moving the SpinLockRelease to after
LWLockQueueSelf(), i.e. to the other side of the "mustwait = ..." line?
That'd probably be ok.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Andres Freund

andres@anarazel.de

almost 11 years ago

In reply to: Heikki Linnakangas (#20)

#22

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 11 years ago

In reply to: Andres Freund (#21)

#23

Andres Freund