[sqlsmith] Unpinning error in parallel worker
Hi,
today's testing with master as of d253b0f6e3 yielded two clusters that
stopped processing queries. Symptoms:
regression=> select application_name, state, wait_event, wait_event_type, count(1), min(pid)
from pg_stat_activity group by 1,2,3,4;
application_name | state | wait_event | wait_event_type | count | min
------------------+--------+--------------------------------+-----------------+-------+------
psql | active | | | 1 | 3781
psql | active | DynamicSharedMemoryControlLock | LWLock | 2 | 3272
sqlsmith::dut | active | DynamicSharedMemoryControlLock | LWLock | 9 | 2726
sqlsmith::dut | active | MessageQueueSend | IPC | 1 | 2625
sqlsmith::dut | active | BgWorkerShutdown | IPC | 1 | 3655
sqlsmith::schema | idle | ClientRead | Client | 9 | 3634
(6 rows)
regression=> select application_name, state, wait_event, wait_event_type, count(1), min(pid)
from pg_stat_activity group by 1,2,3,4;
application_name | state | wait_event | wait_event_type | count | min
------------------+--------+--------------------------------+-----------------+-------+-------
psql | active | | | 1 | 29645
sqlsmith::dut | active | DynamicSharedMemoryControlLock | LWLock | 5 | 23167
sqlsmith::schema | idle | ClientRead | Client | 5 | 1169
sqlsmith::dut | active | BgWorkerShutdown | IPC | 1 | 1178
(4 rows)
Both sport the following last error in their logfiles:
FATAL: cannot unpin a segment that is not pinned
Below are the backtraces of the processes throwing them.
regards,
Andreas
Backtrace on dwagon:
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1 0x00000000006b7c02 in PGSemaphoreLock (sema=0x7f8aa19b20d8) at pg_sema.c:310
#2 0x0000000000726b04 in LWLockAcquire (lock=0x7f8aa19b3400, mode=LW_EXCLUSIVE) at lwlock.c:1233
#3 0x0000000000710b56 in dsm_detach (seg=0x12cf6a0) at dsm.c:760
#4 0x0000000000710e6e in dsm_backend_shutdown () at dsm.c:643
#5 0x000000000071144f in shmem_exit (code=code@entry=1) at ipc.c:248
#6 0x000000000071150e in proc_exit_prepare (code=code@entry=1) at ipc.c:185
#7 0x0000000000711588 in proc_exit (code=code@entry=1) at ipc.c:102
#8 0x00000000008352ff in errfinish (dummy=dummy@entry=0) at elog.c:543
#9 0x0000000000838a8c in elog_finish (elevel=<optimized out>, fmt=0x9e8e70 "cannot unpin a segment that is not pinned") at elog.c:1378
#10 0x00000000007111ff in dsm_unpin_segment (handle=545777640) at dsm.c:917
#11 0x0000000000858205 in dsa_release_in_place (place=0x7f8aa524a178) at dsa.c:617
#12 0x0000000000710ae0 in dsm_detach (seg=0x12cf6a0) at dsm.c:734
#13 0x0000000000710e6e in dsm_backend_shutdown () at dsm.c:643
#14 0x000000000071144f in shmem_exit (code=code@entry=1) at ipc.c:248
#15 0x000000000071150e in proc_exit_prepare (code=code@entry=1) at ipc.c:185
#16 0x0000000000711588 in proc_exit (code=code@entry=1) at ipc.c:102
#17 0x00000000008352ff in errfinish (dummy=<optimized out>) at elog.c:543
#18 0x000000000073425c in ProcessInterrupts () at postgres.c:2837
#19 0x00000000004baeb5 in heapgetpage (scan=scan@entry=0x1376138, page=page@entry=0) at heapam.c:373
#20 0x00000000004bbaf0 in heapgettup_pagemode (key=0x1376550, nkeys=1, dir=ForwardScanDirection, scan=0x1376138) at heapam.c:830
#21 heap_getnext (scan=0x1376138, direction=direction@entry=ForwardScanDirection) at heapam.c:1804
#22 0x00000000004cf2be in systable_getnext (sysscan=sysscan@entry=0x13760e0) at genam.c:435
#23 0x0000000000823760 in ScanPgRelation (targetRelId=<optimized out>, indexOK=<optimized out>, force_non_historic=force_non_historic@entry=0 '\000') at relcache.c:357
#24 0x0000000000823ad3 in RelationReloadIndexInfo (relation=relation@entry=0x7f8aa52321e8) at relcache.c:2235
#25 0x0000000000827dd8 in RelationIdGetRelation (relationId=relationId@entry=2662) at relcache.c:2091
#26 0x00000000004bb088 in relation_open (relationId=relationId@entry=2662, lockmode=lockmode@entry=1) at heapam.c:1128
#27 0x00000000004cf8f6 in index_open (relationId=relationId@entry=2662, lockmode=lockmode@entry=1) at indexam.c:155
#28 0x00000000004cf226 in systable_beginscan (heapRelation=heapRelation@entry=0x7f8aa10e64a0, indexId=2662, indexOK=<optimized out>, snapshot=snapshot@entry=0x0, nkeys=1, key=key@entry=0x7ffe7b2bd260) at genam.c:340
#29 0x000000000081ef9c in SearchCatCache (cache=0x132cd10, v1=v1@entry=16528, v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0) at catcache.c:1234
#30 0x000000000082d46e in SearchSysCache (cacheId=cacheId@entry=46, key1=key1@entry=16528, key2=key2@entry=0, key3=key3@entry=0, key4=key4@entry=0) at syscache.c:1108
#31 0x0000000000530b8c in pg_class_aclmask (table_oid=table_oid@entry=16528, roleid=roleid@entry=10, mask=mask@entry=2, how=how@entry=ACLMASK_ALL) at aclchk.c:3619
#32 0x000000000060944f in ExecCheckRTEPerms (rte=0x13702f8) at execMain.c:633
#33 ExecCheckRTPerms (rangeTable=0x1370698, ereport_on_violation=<optimized out>) at execMain.c:571
#34 0x0000000000609e1d in InitPlan (eflags=16, queryDesc=0x13a20e8) at execMain.c:812
#35 standard_ExecutorStart (queryDesc=0x13a20e8, eflags=16) at execMain.c:249
#36 0x000000000060c824 in ParallelQueryMain (seg=<optimized out>, toc=0x7f8aa524a000) at execParallel.c:849
#37 0x00000000004fa5aa in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1109
#38 0x00000000006ba5a2 in StartBackgroundWorker () at bgworker.c:792
#39 0x00000000006c5952 in do_start_bgworker (rw=0x12ef180) at postmaster.c:5577
#40 maybe_start_bgworker () at postmaster.c:5761
#41 0x00000000006c6376 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5015
#42 <signal handler called>
#43 0x00007f8aa45eeac3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
#44 0x0000000000476085 in ServerLoop () at postmaster.c:1693
#45 0x00000000006c7890 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x12cd540) at postmaster.c:1337
#46 0x000000000047784d in main (argc=4, argv=0x12cd540) at main.c:228
Backtrace on seabisquit:
#0 outNode (str=0x7fffbbf96dc0, obj=0x9) at outfuncs.c:3511
#1 0x000000000066ec2b in nodeToString (obj=0x9) at outfuncs.c:4160
#2 0x0000000000674b9b in pprint (obj=<optimized out>) at print.c:58
#3 <function called from gdb>
#4 0x00007fc5bfc25457 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7fc5bc603058) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#5 do_futex_wait (sem=sem@entry=0x7fc5bc603058, abstime=0x0) at sem_waitcommon.c:111
#6 0x00007fc5bfc25504 in __new_sem_wait_slow (sem=0x7fc5bc603058, abstime=0x0) at sem_waitcommon.c:181
#7 0x00007fc5bfc25599 in __new_sem_wait (sem=sem@entry=0x7fc5bc603058) at sem_wait.c:29
#8 0x00000000006bc322 in PGSemaphoreLock (sema=0x7fc5bc603058) at pg_sema.c:310
#9 0x000000000072b72c in LWLockAcquire (lock=0x7fc5bc604400, mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1233
#10 0x00000000007156e0 in dsm_detach (seg=0x1ee36a0) at dsm.c:760
#11 0x0000000000715a0e in dsm_backend_shutdown () at dsm.c:643
#12 0x0000000000715fef in shmem_exit (code=code@entry=1) at ipc.c:248
#13 0x00000000007160ae in proc_exit_prepare (code=code@entry=1) at ipc.c:185
#14 0x0000000000716128 in proc_exit (code=code@entry=1) at ipc.c:102
#15 0x00000000008378df in errfinish (dummy=dummy@entry=0) at elog.c:543
#16 0x000000000083b06c in elog_finish (elevel=elevel@entry=20, fmt=fmt@entry=0x9eb330 "cannot unpin a segment that is not pinned") at elog.c:1378
#17 0x0000000000715dad in dsm_unpin_segment (handle=1204218611) at dsm.c:917
#18 0x000000000085a835 in dsa_release_in_place (place=0x7fc5bfe891c8) at dsa.c:617
#19 0x000000000071566f in dsm_detach (seg=0x1ee36a0) at dsm.c:734
#20 0x0000000000715a0e in dsm_backend_shutdown () at dsm.c:643
#21 0x0000000000715fef in shmem_exit (code=code@entry=1) at ipc.c:248
#22 0x00000000007160ae in proc_exit_prepare (code=code@entry=1) at ipc.c:185
#23 0x0000000000716128 in proc_exit (code=code@entry=1) at ipc.c:102
#24 0x00000000008378df in errfinish (dummy=<optimized out>) at elog.c:543
#25 0x0000000000738b9e in ProcessInterrupts () at postgres.c:2837
#26 0x00000000007393d9 in ProcessInterrupts () at postgres.c:2824
#27 0x0000000000610725 in ExecProcNode (node=node@entry=0x1fe7cb8) at execProcnode.c:396
#28 0x000000000060bfe6 in ExecutePlan (execute_once=<optimized out>, dest=0x1f6f640, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1fe7cb8, estate=0x1faa830) at execMain.c:1646
#29 standard_ExecutorRun (queryDesc=0x1fb7ad0, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:355
#30 0x000000000060f993 in ParallelQueryMain (seg=<optimized out>, toc=0x7fc5bfe89000) at execParallel.c:856
#31 0x00000000004f9eba in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1109
#32 0x00000000006becd2 in StartBackgroundWorker () at bgworker.c:792
#33 0x00000000006ca042 in do_start_bgworker (rw=0x1f07780) at postmaster.c:5577
#34 maybe_start_bgworker () at postmaster.c:5761
#35 0x00000000006caa76 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5015
#36 <signal handler called>
#37 0x00007fc5bf243e53 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:84
#38 0x0000000000475f5d in ServerLoop () at postmaster.c:1693
#39 0x00000000006cbfa2 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x1ee1540) at postmaster.c:1337
#40 0x0000000000477b6d in main (argc=4, argv=0x1ee1540) at main.c:228
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 27, 2017 at 4:18 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
Hi,
today's testing with master as of d253b0f6e3 yielded two clusters that
stopped processing queries. Symptoms:[...]
Thanks Andreas. Investigating.
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 27, 2017 at 8:38 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Mon, Mar 27, 2017 at 4:18 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
Hi,
today's testing with master as of d253b0f6e3 yielded two clusters that
stopped processing queries. Symptoms:[...]
Thanks Andreas. Investigating.
First, the hanging stems from reentering dsm_backend_shutdown and
trying to acquire DynamicSharedMemoryControlLock which we already
acquired further down the stack in dsm_unpin_segment when it raised an
error. That's obviously not great, but the real question is how we
reached this this-cannot-happen error condition.
I reproduced this by inserting a sleep before dsa_attach_in_place,
inserting a call to dsa_allocate into ExecInitParallelPlan so that the
executor's DSA area owns at least one segment, and then cancelling a
parallel query before the sleepy worker has managed to attach. The
DSA area is destroyed before the worker attaches, but the worker
doesn't know this, and goes on to destroy it again after it learns
that the query has been cancelled.
In an earlier version of DSA, attaching should have failed in this
scenario because the handle would be invalid. Based on complaints
about creating an extra DSM segment all the time even if we don't turn
out to need it, I implemented "in place" DSA areas where the control
object is in user-supplied shared memory, in this case in the parallel
query main DSM segment. But that created a new hazard: if you try to
attach to a piece of memory that contains the remains of a
already-destroyed DSA area, then we don't do anything to detect that.
Oops.
The attached patch fixes that one way: it detects refcnt == 0 as a
defunct DSA area and raises an error when you try to attach.
Another approach which I could explore would be to "reset" the DSA
area instead of destroying it when the last backend detaches. I'm not
sure if that would ever be useful as a feature in its own right, but
it would at least allow a very late worker to attach and then detach
in an orderly fashion in this query-cancelled case, so you wouldn't
get a different error message in the worker in this rare case.
Thoughts?
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
detect-late-dsa-attach-in-place.patchapplication/octet-stream; name=detect-late-dsa-attach-in-place.patchDownload
diff --git a/src/backend/utils/mmgr/dsa.c b/src/backend/utils/mmgr/dsa.c
index 6d5d12a6c9..d08317249b 100644
--- a/src/backend/utils/mmgr/dsa.c
+++ b/src/backend/utils/mmgr/dsa.c
@@ -1314,6 +1314,13 @@ attach_internal(void *place, dsm_segment *segment, dsa_handle handle)
/* Bump the reference count. */
LWLockAcquire(DSA_AREA_LOCK(area), LW_EXCLUSIVE);
+ if (control->refcnt == 0)
+ {
+ /* We can't attach to a DSA area that has already been destroyed. */
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("could not attach to dsa_area")));
+ }
++control->refcnt;
LWLockRelease(DSA_AREA_LOCK(area));
On Mon, Mar 27, 2017 at 6:53 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Mon, Mar 27, 2017 at 8:38 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Mon, Mar 27, 2017 at 4:18 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote:
Hi,
today's testing with master as of d253b0f6e3 yielded two clusters that
stopped processing queries. Symptoms:[...]
Thanks Andreas. Investigating.
First, the hanging stems from reentering dsm_backend_shutdown and
trying to acquire DynamicSharedMemoryControlLock which we already
acquired further down the stack in dsm_unpin_segment when it raised an
error. That's obviously not great, but the real question is how we
reached this this-cannot-happen error condition.I reproduced this by inserting a sleep before dsa_attach_in_place,
inserting a call to dsa_allocate into ExecInitParallelPlan so that the
executor's DSA area owns at least one segment, and then cancelling a
parallel query before the sleepy worker has managed to attach. The
DSA area is destroyed before the worker attaches, but the worker
doesn't know this, and goes on to destroy it again after it learns
that the query has been cancelled.In an earlier version of DSA, attaching should have failed in this
scenario because the handle would be invalid. Based on complaints
about creating an extra DSM segment all the time even if we don't turn
out to need it, I implemented "in place" DSA areas where the control
object is in user-supplied shared memory, in this case in the parallel
query main DSM segment. But that created a new hazard: if you try to
attach to a piece of memory that contains the remains of a
already-destroyed DSA area, then we don't do anything to detect that.
Oops.The attached patch fixes that one way: it detects refcnt == 0 as a
defunct DSA area and raises an error when you try to attach.Another approach which I could explore would be to "reset" the DSA
area instead of destroying it when the last backend detaches. I'm not
sure if that would ever be useful as a feature in its own right, but
it would at least allow a very late worker to attach and then detach
in an orderly fashion in this query-cancelled case, so you wouldn't
get a different error message in the worker in this rare case.Thoughts?
Added to open items.
I considered whether the error message could be improved but it
matches the message for an existing similar case (where you try to
attach to an unknown handle).
The alternative approach I mentioned above doesn't seem warranted, as
you can already get various different failure messages depending on
timing.
Based on feedback on another thread about how to make reviewers' and
committers' jobs easier, here is a format-patch version with a short
description as raw material for a commit message, in case that is
helpful.
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
detect-late-dsa-attach-v2.patchapplication/octet-stream; name=detect-late-dsa-attach-v2.patchDownload
From aec2add65853f2dca3cb138a524af8eb5c85f469 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Wed, 29 Mar 2017 17:30:46 +1300
Subject: [PATCH] Reject attempts to attach to a defunct DSA area.
As discovered by Andreas Seltenreich via sqlsmith, it is possible for a
late-starting parallel query worker to attach to the executor's DSA area
after all other backends have detached and it has already been destroyed,
leading to a second attempt to destroy it. Detect and prevent this case
when attaching.
Thomas Munro
Discussion: http://postgr.es/m/CAEepm%3D2VvKCVXjg-REHTikoBUOhtfvfCyiJrT%3DzHDJyvMNd9pw%40mail.gmail.com
---
src/backend/utils/mmgr/dsa.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/backend/utils/mmgr/dsa.c b/src/backend/utils/mmgr/dsa.c
index 6d5d12a6c9..d08317249b 100644
--- a/src/backend/utils/mmgr/dsa.c
+++ b/src/backend/utils/mmgr/dsa.c
@@ -1314,6 +1314,13 @@ attach_internal(void *place, dsm_segment *segment, dsa_handle handle)
/* Bump the reference count. */
LWLockAcquire(DSA_AREA_LOCK(area), LW_EXCLUSIVE);
+ if (control->refcnt == 0)
+ {
+ /* We can't attach to a DSA area that has already been destroyed. */
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("could not attach to dsa_area")));
+ }
++control->refcnt;
LWLockRelease(DSA_AREA_LOCK(area));
--
2.12.2
On Wed, Mar 29, 2017 at 1:31 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
I considered whether the error message could be improved but it
matches the message for an existing similar case (where you try to
attach to an unknown handle).
Ugh, OK. I committed this, but I think this whole file needs a visit
from the message style police.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thomas Munro writes:
Based on feedback on another thread about how to make reviewers' and
committers' jobs easier, here is a format-patch version with a short
description as raw material for a commit message, in case that is
helpful.
+1
It's quite convenient. Otherwise I have to be creative myself writing
commit messages in order to keep track of which patches are applied on
the branch sqlsmith is crunching on.
regards,
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 30, 2017 at 3:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 29, 2017 at 1:31 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:I considered whether the error message could be improved but it
matches the message for an existing similar case (where you try to
attach to an unknown handle).Ugh, OK. I committed this, but I think this whole file needs a visit
from the message style police.
Like this?
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
improve-error-messages.patchapplication/octet-stream; name=improve-error-messages.patchDownload
diff --git a/src/backend/utils/mmgr/dsa.c b/src/backend/utils/mmgr/dsa.c
index d08317249b..c54cf4d57d 100644
--- a/src/backend/utils/mmgr/dsa.c
+++ b/src/backend/utils/mmgr/dsa.c
@@ -515,7 +515,7 @@ dsa_attach(dsa_handle handle)
if (segment == NULL)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("could not attach to dsa_handle")));
+ errmsg("could not attach to DSA handle")));
area = attach_internal(dsm_segment_address(segment), segment, handle);
@@ -1319,7 +1319,7 @@ attach_internal(void *place, dsm_segment *segment, dsa_handle handle)
/* We can't attach to a DSA area that has already been destroyed. */
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("could not attach to dsa_area")));
+ errmsg("could not attach to DSA area")));
}
++control->refcnt;
LWLockRelease(DSA_AREA_LOCK(area));
On Wed, Apr 5, 2017 at 3:59 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Thu, Mar 30, 2017 at 3:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 29, 2017 at 1:31 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:I considered whether the error message could be improved but it
matches the message for an existing similar case (where you try to
attach to an unknown handle).Ugh, OK. I committed this, but I think this whole file needs a visit
from the message style police.Like this?
I was thinking of maybe not creating two separate (translatable)
messages, and just using "could not attach to dynamic shared area" for
both.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 5, 2017 at 10:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Ugh, OK. I committed this, but I think this whole file needs a visit
from the message style police.Like this?
I was thinking of maybe not creating two separate (translatable)
messages, and just using "could not attach to dynamic shared area" for
both.
Done that way.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 29, 2017, at 10:50, Robert Haas wrote:
On Wed, Mar 29, 2017 at 1:31 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:I considered whether the error message could be improved but it
matches the message for an existing similar case (where you try to
attach to an unknown handle).Ugh, OK. I committed this, but I think this whole file needs a visit
from the message style police.
I ran into an issue that matches the symptoms of this bug on a production cluster running 10.3 (Ubuntu 10.3-1.pgdg16.04+1) twice in the past week.
Hundreds of queries stuck with a wait_event of DynamicSharedMemoryControlLock and pg_terminate_backend did not terminate the queries.
In the log:
FATAL: cannot unpin a segment that is not pinned
I don't have a backtrace yet, but I will provide them if/when the issue happens again.
Jonathan
On Wed, Apr 18, 2018 at 8:52 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:
Hundreds of queries stuck with a wait_event of DynamicSharedMemoryControlLock and pg_terminate_backend did not terminate the queries.
In the log:
FATAL: cannot unpin a segment that is not pinned
Thanks for the report. That error is reachable via two paths:
1. Cleanup of a DSA area at the end of a query, giving back all
segments. This is how the bug originally reported in this thread
reached it, and that's because of a case where we tried to
double-destroy the DSA area when refcount went down to zero, then back
up again, and then back to zero (late starting parallel worker that
attached in a narrow time window). That was fixed in fddf45b3: once
it reaches zero we recognise it as already destroyed and don't even
let anyone attach.
2. In destroy_superblock(), called by dsa_free(), when we're where
we've determined that a 64kb superblock can be given back to the DSM
segment, and that the DSM segment is now entirely free so can be given
back to the operating system. To do that, after we put the pages back
into the free page manager we test fpm_largest(segment_map->fpm) ==
segment_map->header->usable_pages to see if the largest span of free
pages is now the same size as the whole segment.
I don't have any theories about how that could be going wrong right
now, but I'm looking into it. There could be a logic bug in dsa.c, or
a logic bug in client code running an invalid sequence of
dsa_allocate(), dsa_free() calls that corrupts state (I wonder if a
well timed double dsa_free() could produce this effect), or a
common-or-garden overrun bug somewhere that trashes control state.
I don't have a backtrace yet, but I will provide them if/when the issue happens again.
Thanks, that would be much appreciated, as would any clues about what
workload you're running. Do you know what the query plan looks like
for the queries that crashed?
--
Thomas Munro
http://www.enterprisedb.com
On Tue, Apr 17, 2018, at 18:38, Thomas Munro wrote:
I don't have any theories about how that could be going wrong right
now, but I'm looking into it.
Thank you!
I don't have a backtrace yet, but I will provide them if/when the issue happens again.
Thanks, that would be much appreciated, as would any clues about what
workload you're running. Do you know what the query plan looks like
for the queries that crashed?
Yep, I think I know approximately what it looked like, I've attached a lightly redacted plan. All of the hung queries were running some variant of this plan as far as I can tell.
Thanks again!
Jonathan
Attachments:
On Wed, Apr 18, 2018 at 11:01 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:
On Tue, Apr 17, 2018, at 18:38, Thomas Munro wrote:
Thanks, that would be much appreciated, as would any clues about what
workload you're running. Do you know what the query plan looks like
for the queries that crashed?Yep, I think I know approximately what it looked like, I've attached a lightly redacted plan. All of the hung queries were running some variant of this plan as far as I can tell.
Hmm, that isn't a parallel query. I was expecting to see "Gather" and
"Parallel" in there.
--
Thomas Munro
http://www.enterprisedb.com
On Tue, Apr 17, 2018, at 19:31, Thomas Munro wrote:
On Wed, Apr 18, 2018 at 11:01 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:On Tue, Apr 17, 2018, at 18:38, Thomas Munro wrote:
Thanks, that would be much appreciated, as would any clues about what
workload you're running. Do you know what the query plan looks like
for the queries that crashed?Yep, I think I know approximately what it looked like, I've attached a lightly redacted plan. All of the hung queries were running some variant of this plan as far as I can tell.
Hmm, that isn't a parallel query. I was expecting to see "Gather" and
"Parallel" in there.
Oops, I'm really sorry about that. I only have the first part of the hung queries, and there are a few variants. Here's one that's parallel.
Attachments:
On Wed, Apr 18, 2018 at 11:43 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:
On Tue, Apr 17, 2018, at 19:31, Thomas Munro wrote:
On Wed, Apr 18, 2018 at 11:01 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:Yep, I think I know approximately what it looked like, I've attached a lightly redacted plan. All of the hung queries were running some variant of this plan as far as I can tell.
Hmm, that isn't a parallel query. I was expecting to see "Gather" and
"Parallel" in there.Oops, I'm really sorry about that. I only have the first part of the hung queries, and there are a few variants. Here's one that's parallel.
I spent some time trying to reproduce this failure without any luck,
using query plans similar to your Gather plan fragment, and using some
test harness code for the allocator stuff in isolation.
I had an idea that (1) freeing a large object that releases and unpins
a segment in one backend and then (2) freeing it again in another
backend (illegally) might produce this effect with sufficiently bad
luck. I'm still trying to reproduce that without any success, but I
get other kinds of failures which I think you'd be seeing too if that
hunch were right. Still looking...
--
Thomas Munro
http://www.enterprisedb.com
On Fri, Apr 20, 2018, at 00:42, Thomas Munro wrote:
On Wed, Apr 18, 2018 at 11:43 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:On Tue, Apr 17, 2018, at 19:31, Thomas Munro wrote:
On Wed, Apr 18, 2018 at 11:01 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:Yep, I think I know approximately what it looked like, I've attached a lightly redacted plan. All of the hung queries were running some variant of this plan as far as I can tell.
Hmm, that isn't a parallel query. I was expecting to see "Gather" and
"Parallel" in there.Oops, I'm really sorry about that. I only have the first part of the hung queries, and there are a few variants. Here's one that's parallel.
I spent some time trying to reproduce this failure without any luck,
using query plans similar to your Gather plan fragment, and using some
test harness code for the allocator stuff in isolation.I had an idea that (1) freeing a large object that releases and unpins
a segment in one backend and then (2) freeing it again in another
backend (illegally) might produce this effect with sufficiently bad
luck. I'm still trying to reproduce that without any success, but I
get other kinds of failures which I think you'd be seeing too if that
hunch were right. Still looking...
Thanks for investigating! We have monitoring in place and a plan to grab stack traces if this happens again. Is there anything else that would be useful for us to try to get in addition to a stack trace from the process processing the stuck query?
On Fri, Apr 20, 2018, at 00:42, Thomas Munro wrote:
On Wed, Apr 18, 2018 at 11:43 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:On Tue, Apr 17, 2018, at 19:31, Thomas Munro wrote:
On Wed, Apr 18, 2018 at 11:01 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:Yep, I think I know approximately what it looked like, I've attached a lightly redacted plan. All of the hung queries were running some variant of this plan as far as I can tell.
Hmm, that isn't a parallel query. I was expecting to see "Gather" and
"Parallel" in there.Oops, I'm really sorry about that. I only have the first part of the hung queries, and there are a few variants. Here's one that's parallel.
I spent some time trying to reproduce this failure without any luck,
using query plans similar to your Gather plan fragment, and using some
test harness code for the allocator stuff in isolation.I had an idea that (1) freeing a large object that releases and unpins
a segment in one backend and then (2) freeing it again in another
backend (illegally) might produce this effect with sufficiently bad
luck. I'm still trying to reproduce that without any success, but I
get other kinds of failures which I think you'd be seeing too if that
hunch were right. Still looking...
This issue happened again in production, here are the stack traces from three we grabbed before nuking the >400 hanging backends.
process: /proc/24194/mem
thread: 0x7f46263648c0, lwp: 24194, type: 1
#0 0x00007f4625fdc827 in do_futex_wait+55() at /lib/x86_64-linux-gnu/libpthread.so.0 at futex-internal.h:205
#1 0x00007f4625fdc8d4 in __new_sem_wait_slow+84() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_waitcommon.c:181
#2 0x00007f4625fdc97a in __new_sem_wait+58() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_wait.c:29
#3 0x000055fccb8c8832 in PGSemaphoreLock+34() at /usr/lib/postgresql/10/bin/postgres at pg_sema.c:310
#4 0x000055fccb93b21c in LWLockAcquire+188() at /usr/lib/postgresql/10/bin/postgres at lwlock.c:1233
#5 0x000055fccb925fa7 in dsm_create+151() at /usr/lib/postgresql/10/bin/postgres at dsm.c:493
#6 0x000055fccb6f2a6f in InitializeParallelDSM+511() at /usr/lib/postgresql/10/bin/postgres at parallel.c:266
#7 0x000055fccb815f27 in ExecInitParallelPlan+807() at /usr/lib/postgresql/10/bin/postgres at execParallel.c:470
#8 0x000055fccb825a50 in ExecGather+1168() at /usr/lib/postgresql/10/bin/postgres at nodeGather.c:158
#9 0x000055fccb81dbd3 in ExecAppend+83() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#10 0x000055fccb81df2c in fetch_input_tuple+172() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#11 0x000055fccb820293 in ExecAgg+1235() at /usr/lib/postgresql/10/bin/postgres at nodeAgg.c:2539
#12 0x000055fccb835235 in CteScanNext+229() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#13 0x000055fccb817f0d in ExecScan+557() at /usr/lib/postgresql/10/bin/postgres at execScan.c:97
#14 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#15 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#16 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#17 0x000055fccb81df2c in fetch_input_tuple+172() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#18 0x000055fccb8201eb in ExecAgg+1067() at /usr/lib/postgresql/10/bin/postgres at nodeAgg.c:2347
#19 0x000055fccb811dfb in standard_ExecutorRun+379() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#20 0x000055fccb94d19b in PortalRunSelect+507() at /usr/lib/postgresql/10/bin/postgres at pquery.c:932
#21 0x000055fccb94e798 in PortalRun+824() at /usr/lib/postgresql/10/bin/postgres at pquery.c:773
#22 0x000055fccb94b329 in PostgresMain+3721() at /usr/lib/postgresql/10/bin/postgres at postgres.c:1984
#23 0x000055fccb6840d9 in ServerLoop+3412() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:4405
#24 0x000055fccb8da78b in PostmasterMain+4235() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:1363
#25 0x000055fccb6854d5 in main+2165() at /usr/lib/postgresql/10/bin/postgres at main.c:228
#26 0x00007f4623c07830 in __libc_start_main+240() at /lib/x86_64-linux-gnu/libc.so.6 at libc-start.c:291
#27 0x000055fccb685559 in _start!+41() at /usr/lib/postgresql/10/bin/postgres
process: /proc/24231/mem
thread: 0x7f46263648c0, lwp: 24231, type: 1
#0 0x00007f4625fdc827 in do_futex_wait+55() at /lib/x86_64-linux-gnu/libpthread.so.0 at futex-internal.h:205
#1 0x00007f4625fdc8d4 in __new_sem_wait_slow+84() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_waitcommon.c:181
#2 0x00007f4625fdc97a in __new_sem_wait+58() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_wait.c:29
#3 0x000055fccb8c8832 in PGSemaphoreLock+34() at /usr/lib/postgresql/10/bin/postgres at pg_sema.c:310
#4 0x000055fccb93b21c in LWLockAcquire+188() at /usr/lib/postgresql/10/bin/postgres at lwlock.c:1233
#5 0x000055fccb925fa7 in dsm_create+151() at /usr/lib/postgresql/10/bin/postgres at dsm.c:493
#6 0x000055fccb6f2a6f in InitializeParallelDSM+511() at /usr/lib/postgresql/10/bin/postgres at parallel.c:266
#7 0x000055fccb815f27 in ExecInitParallelPlan+807() at /usr/lib/postgresql/10/bin/postgres at execParallel.c:470
#8 0x000055fccb825a50 in ExecGather+1168() at /usr/lib/postgresql/10/bin/postgres at nodeGather.c:158
#9 0x000055fccb81dbd3 in ExecAppend+83() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#10 0x000055fccb8348d6 in ExecSort+278() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#11 0x000055fccb834bd7 in ExecUnique+135() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#12 0x000055fccb835235 in CteScanNext+229() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#13 0x000055fccb817f0d in ExecScan+557() at /usr/lib/postgresql/10/bin/postgres at execScan.c:97
#14 0x000055fccb811dfb in standard_ExecutorRun+379() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#15 0x000055fccb94d19b in PortalRunSelect+507() at /usr/lib/postgresql/10/bin/postgres at pquery.c:932
#16 0x000055fccb94e798 in PortalRun+824() at /usr/lib/postgresql/10/bin/postgres at pquery.c:773
#17 0x000055fccb94b329 in PostgresMain+3721() at /usr/lib/postgresql/10/bin/postgres at postgres.c:1984
#18 0x000055fccb6840d9 in ServerLoop+3412() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:4405
#19 0x000055fccb8da78b in PostmasterMain+4235() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:1363
#20 0x000055fccb6854d5 in main+2165() at /usr/lib/postgresql/10/bin/postgres at main.c:228
#21 0x00007f4623c07830 in __libc_start_main+240() at /lib/x86_64-linux-gnu/libc.so.6 at libc-start.c:291
#22 0x000055fccb685559 in _start!+41() at /usr/lib/postgresql/10/bin/postgres
process: /proc/24136/mem
thread: 0x7f46263648c0, lwp: 24136, type: 1
#0 0x00007f4625fdc827 in do_futex_wait+55() at /lib/x86_64-linux-gnu/libpthread.so.0 at futex-internal.h:205
#1 0x00007f4625fdc8d4 in __new_sem_wait_slow+84() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_waitcommon.c:181
#2 0x00007f4625fdc97a in __new_sem_wait+58() at /lib/x86_64-linux-gnu/libpthread.so.0 at sem_wait.c:29
#3 0x000055fccb8c8832 in PGSemaphoreLock+34() at /usr/lib/postgresql/10/bin/postgres at pg_sema.c:310
#4 0x000055fccb93b21c in LWLockAcquire+188() at /usr/lib/postgresql/10/bin/postgres at lwlock.c:1233
#5 0x000055fccb925fa7 in dsm_create+151() at /usr/lib/postgresql/10/bin/postgres at dsm.c:493
#6 0x000055fccb6f2a6f in InitializeParallelDSM+511() at /usr/lib/postgresql/10/bin/postgres at parallel.c:266
#7 0x000055fccb815f27 in ExecInitParallelPlan+807() at /usr/lib/postgresql/10/bin/postgres at execParallel.c:470
#8 0x000055fccb825a50 in ExecGather+1168() at /usr/lib/postgresql/10/bin/postgres at nodeGather.c:158
#9 0x000055fccb81dbd3 in ExecAppend+83() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#10 0x000055fccb81df2c in fetch_input_tuple+172() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#11 0x000055fccb820293 in ExecAgg+1235() at /usr/lib/postgresql/10/bin/postgres at nodeAgg.c:2539
#12 0x000055fccb835235 in CteScanNext+229() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#13 0x000055fccb817f0d in ExecScan+557() at /usr/lib/postgresql/10/bin/postgres at execScan.c:97
#14 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#15 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#16 0x000055fccb832155 in ExecNestLoop+165() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#17 0x000055fccb81df2c in fetch_input_tuple+172() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#18 0x000055fccb8201eb in ExecAgg+1067() at /usr/lib/postgresql/10/bin/postgres at nodeAgg.c:2347
#19 0x000055fccb811dfb in standard_ExecutorRun+379() at /usr/lib/postgresql/10/bin/postgres at executor.h:250
#20 0x000055fccb94d19b in PortalRunSelect+507() at /usr/lib/postgresql/10/bin/postgres at pquery.c:932
#21 0x000055fccb94e798 in PortalRun+824() at /usr/lib/postgresql/10/bin/postgres at pquery.c:773
#22 0x000055fccb94b329 in PostgresMain+3721() at /usr/lib/postgresql/10/bin/postgres at postgres.c:1984
#23 0x000055fccb6840d9 in ServerLoop+3412() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:4405
#24 0x000055fccb8da78b in PostmasterMain+4235() at /usr/lib/postgresql/10/bin/postgres at postmaster.c:1363
#25 0x000055fccb6854d5 in main+2165() at /usr/lib/postgresql/10/bin/postgres at main.c:228
#26 0x00007f4623c07830 in __libc_start_main+240() at /lib/x86_64-linux-gnu/libc.so.6 at libc-start.c:291
#27 0x000055fccb685559 in _start!+41() at /usr/lib/postgresql/10/bin/postgres
On Wed, Apr 25, 2018 at 2:21 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:
This issue happened again in production, here are the stack traces from three we grabbed before nuking the >400 hanging backends.
[...]
#4 0x000055fccb93b21c in LWLockAcquire+188() at /usr/lib/postgresql/10/bin/postgres at lwlock.c:1233
#5 0x000055fccb925fa7 in dsm_create+151() at /usr/lib/postgresql/10/bin/postgres at dsm.c:493
#6 0x000055fccb6f2a6f in InitializeParallelDSM+511() at /usr/lib/postgresql/10/bin/postgres at parallel.c:266
[...]
Thank you. These stacks are all blocked trying to acquire
DynamicSharedMemoryControlLock. My theory is that they can't because
one backend -- the one that emitted the error "FATAL: cannot unpin a
segment that is not pinned" -- is deadlocked against itself. After
emitting that error you can see from Andreas's "seabisquit" stack that
that shmem_exit() runs dsm_backend_shutdown() which runs dsm_detach()
which tries to acquire DynamicSharedMemoryControlLock again, even
though we already hold it at that point.
I'll write a patch to fix that unpleasant symptom. While holding
DynamicSharedMemoryControlLock we shouldn't raise any errors without
releasing it first, because the error handling path will try to
acquire it again. That's a horrible failure mode as you have
discovered.
But that isn't the root problem: we shouldn't be raising that error,
and I'd love to see the stack of the one process that did that and
then self-deadlocked. I will have another go at trying to reproduce
it here today.
--
Thomas Munro
http://www.enterprisedb.com
On Tue, Apr 24, 2018, at 16:06, Thomas Munro wrote:
On Wed, Apr 25, 2018 at 2:21 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:This issue happened again in production, here are the stack traces from three we grabbed before nuking the >400 hanging backends.
[...]
#4 0x000055fccb93b21c in LWLockAcquire+188() at /usr/lib/postgresql/10/bin/postgres at lwlock.c:1233
#5 0x000055fccb925fa7 in dsm_create+151() at /usr/lib/postgresql/10/bin/postgres at dsm.c:493
#6 0x000055fccb6f2a6f in InitializeParallelDSM+511() at /usr/lib/postgresql/10/bin/postgres at parallel.c:266
[...]Thank you. These stacks are all blocked trying to acquire
DynamicSharedMemoryControlLock. My theory is that they can't because
one backend -- the one that emitted the error "FATAL: cannot unpin a
segment that is not pinned" -- is deadlocked against itself. After
emitting that error you can see from Andreas's "seabisquit" stack that
that shmem_exit() runs dsm_backend_shutdown() which runs dsm_detach()
which tries to acquire DynamicSharedMemoryControlLock again, even
though we already hold it at that point.I'll write a patch to fix that unpleasant symptom. While holding
DynamicSharedMemoryControlLock we shouldn't raise any errors without
releasing it first, because the error handling path will try to
acquire it again. That's a horrible failure mode as you have
discovered.But that isn't the root problem: we shouldn't be raising that error,
and I'd love to see the stack of the one process that did that and
then self-deadlocked. I will have another go at trying to reproduce
it here today.
Thanks for the update!
We have turned off parallel queries (using max_parallel_workers_per_gather = 0) for now, as the production impact of this bug is unfortunately quite problematic.
What will this failure look like with the patch you've proposed?
Thanks again,
Jonathan
On Wed, Apr 25, 2018 at 8:15 AM, Jonathan Rudenberg
<jonathan@titanous.com> wrote:
On Tue, Apr 24, 2018, at 16:06, Thomas Munro wrote:
I'll write a patch to fix that unpleasant symptom. While holding
DynamicSharedMemoryControlLock we shouldn't raise any errors without
releasing it first, because the error handling path will try to
acquire it again. That's a horrible failure mode as you have
discovered.But that isn't the root problem: we shouldn't be raising that error,
and I'd love to see the stack of the one process that did that and
then self-deadlocked. I will have another go at trying to reproduce
it here today.Thanks for the update!
We have turned off parallel queries (using max_parallel_workers_per_gather = 0) for now, as the production impact of this bug is unfortunately quite problematic.
Apologies for the delay... I've tried a few times reproduce the
problem in vain. There may be a timing element here.
What will this failure look like with the patch you've proposed?
On second thoughts, I think it depends what is going wrong and I'm not
entirely sure if the result would necessarily be better. It might be
that errors are raised and normal service resumes or it might be that
we have tangled up our resources in a bad way. I need to get to the
bottom of this.
--
Thomas Munro
http://www.enterprisedb.com