[sqlsmith] Failed assertions on parallel worker shutdown

Started by Andreas Seltenreichover 9 years ago13 messages

seltenreich@gmx.de

over 9 years ago

There's another class of parallel worker core dumps when testing master
with sqlsmith. In these cases, the following assertion fails for all
workers simulataneously:

TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: "shm_mq.c", Line: 386)

The backtraces of the controlling process is always in
ExecShutdownGatherWorkers. The queries always work fine on re-running,
so I guess there is some race condition on worker shutdown? Backtraces
below.

regards
andreas

Core was generated by `postgres: bgworker: parallel worker for PID 30525 '.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f5a3df91067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f5a3df91067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f5a3df92448 in __GI_abort () at abort.c:89
#2 0x00000000007eabe1 in ExceptionalCondition (conditionName=conditionName@entry=0x984e10 "!(mqh->mqh_partial_bytes <= nbytes)", errorType=errorType@entry=0x82a75d "FailedAssertion", fileName=fileName@entry=0x984b8c "shm_mq.c", lineNumber=lineNumber@entry=386) at assert.c:54
#3 0x00000000006d8042 in shm_mq_sendv (mqh=0x25f17b8, iov=iov@entry=0x7ffc6352af00, iovcnt=iovcnt@entry=1, nowait=<optimized out>) at shm_mq.c:386
#4 0x00000000006d807d in shm_mq_send (mqh=<optimized out>, nbytes=<optimized out>, data=<optimized out>, nowait=<optimized out>) at shm_mq.c:327
#5 0x00000000005d96b9 in ExecutePlan (dest=0x25f1850, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x2612da8, estate=0x2612658) at execMain.c:1596
#6 standard_ExecutorRun (queryDesc=0x261a660, direction=<optimized out>, count=0) at execMain.c:338
#7 0x00000000005dc7cf in ParallelQueryMain (seg=<optimized out>, toc=0x7f5a3ea6c000) at execParallel.c:735
#8 0x00000000004e617b in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1035
#9 0x0000000000683862 in StartBackgroundWorker () at bgworker.c:726
#10 0x000000000068e9a2 in do_start_bgworker (rw=0x2590760) at postmaster.c:5531
#11 maybe_start_bgworker () at postmaster.c:5706
#12 0x000000000046cbba in ServerLoop () at postmaster.c:1762
#13 0x000000000069081e in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x256d580) at postmaster.c:1298
#14 0x000000000046d80d in main (argc=4, argv=0x256d580) at main.c:228
(gdb) attach 30525
0x00007f5a3e044e33 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f5a3e044e33 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1 0x00000000006d1b4e in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffc6352aec0, cur_timeout=-1, set=0x44251c0) at latch.c:981
#2 WaitEventSetWait (set=set@entry=0x44251c0, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7ffc6352aec0, nevents=nevents@entry=1) at latch.c:935
#3 0x00000000006d1f96 in WaitLatchOrSocket (latch=0x7f5a3d898494, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=timeout@entry=-1) at latch.c:347
#4 0x00000000006d205d in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=-1) at latch.c:302
#5 0x00000000004e6d64 in WaitForParallelWorkersToFinish (pcxt=0x442d4e8) at parallel.c:537
#6 0x00000000005dcf84 in ExecParallelFinish (pei=0x441cab8) at execParallel.c:541
#7 0x00000000005eeead in ExecShutdownGatherWorkers (node=node@entry=0x3e3a070) at nodeGather.c:416
#8 0x00000000005ef389 in ExecShutdownGather (node=0x3e3a070) at nodeGather.c:430
#9 0x00000000005dd03d in ExecShutdownNode (node=0x3e3a070) at execProcnode.c:807
#10 0x000000000061ad73 in planstate_tree_walker (planstate=0x3e361a8, walker=0x5dd010 <ExecShutdownNode>, context=0x0) at nodeFuncs.c:3442
#11 0x000000000061ad73 in planstate_tree_walker (planstate=0xf323c30, walker=0x5dd010 <ExecShutdownNode>, context=0x0) at nodeFuncs.c:3442
#12 0x000000000061ad73 in planstate_tree_walker (planstate=0xf323960, walker=0x5dd010 <ExecShutdownNode>, context=0x0) at nodeFuncs.c:3442
#13 0x00000000005d96da in ExecutePlan (dest=0xb826868, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0xf323960, estate=0xf322b28) at execMain.c:1576
#14 standard_ExecutorRun (queryDesc=0xddca888, direction=<optimized out>, count=0) at execMain.c:338
#15 0x00000000006f6e88 in PortalRunSelect (portal=portal@entry=0x258ccc8, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0xb826868) at pquery.c:946
#16 0x00000000006f83ae in PortalRun (portal=0x258ccc8, count=9223372036854775807, isTopLevel=<optimized out>, dest=0xb826868, altdest=0xb826868, completionTag=0x7ffc6352b3d0 "") at pquery.c:787
#17 0x00000000006f5c63 in exec_simple_query (query_string=<optimized out>) at postgres.c:1094
#18 PostgresMain (argc=39374024, argv=0x25ed130, dbname=0x256e480 "regression", username=0x25ed308 "0\321^\002") at postgres.c:4059
#19 0x000000000046c8b2 in BackendRun (port=0x25935d0) at postmaster.c:4258
#20 BackendStartup (port=0x25935d0) at postmaster.c:3932
#21 ServerLoop () at postmaster.c:1690
#22 0x000000000069081e in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x256d580) at postmaster.c:1298
#23 0x000000000046d80d in main (argc=4, argv=0x256d580) at main.c:228

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andreas Seltenreich

seltenreich@gmx.de

over 9 years ago

In reply to: Andreas Seltenreich (#1)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

I wrote:

There's another class of parallel worker core dumps when testing master
with sqlsmith. In these cases, the following assertion fails for all
workers simulataneously:

TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: "shm_mq.c", Line: 386)

I no longer observe these after applying these two patches by Amit
Kapila:

avoid_restricted_clause_below_gather_v1.patch
Message-ID: <CAA4eK1Ky2=HsTsT4hmfL=EAL5rv0_t59tvWzVK9HQKvN6Dovkw@mail.gmail.com>

prohibit_parallel_clause_below_rel_v1.patch
Message-ID: <CAA4eK1L-Uo=s4=0jvvVA51pj06u5WdDvSQg673yuxJ_Ja+x86Q@mail.gmail.com>

I guess these failed assertions are fallout from the workers executing
forbidden plans without the patches.

regards,
Andreas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andreas Seltenreich

seltenreich@gmx.de

over 9 years ago

In reply to: Andreas Seltenreich (#2)

1 attachment(s)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

I wrote:

There's another class of parallel worker core dumps when testing master
with sqlsmith. In these cases, the following assertion fails for all
workers simulataneously:

TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: "shm_mq.c", Line: 386)

I no longer observe these after applying these two patches by Amit
Kapila

I spoke too soon: These still occur with the patches applied, but with
much lower probability. (one core dump per 20e6 random queries instead
of 1e6).

Most of the collected plans look inconspicuous to me now, except for one
that again had a subplan below a gather node (plan6). Tarball of all
collected plans attached.

regards,
Andreas

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Andreas Seltenreich (#3)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

On Mon, May 23, 2016 at 2:19 PM, Andreas Seltenreich <seltenreich@gmx.de>
wrote:

I wrote:

There's another class of parallel worker core dumps when testing master
with sqlsmith. In these cases, the following assertion fails for all
workers simulataneously:

TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File:

"shm_mq.c", Line: 386)

I no longer observe these after applying these two patches by Amit
Kapila

I spoke too soon: These still occur with the patches applied, but with
much lower probability. (one core dump per 20e6 random queries instead
of 1e6).

Most of the collected plans look inconspicuous to me now, except for one
that again had a subplan below a gather node (plan6).

This problem looks different from the previous problems you have reported.
Earlier problems were due to the reason that some unsafe/restricted
expressions were pushed below Gather node as part of target list whereas in
the plan6, it seems some unsafe node is pushed below Gather node. It will
be helpful if you can share the offending query?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Andreas Seltenreich

seltenreich@gmx.de

over 9 years ago

In reply to: Amit Kapila (#4)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

Amit Kapila writes:

Earlier problems were due to the reason that some unsafe/restricted
expressions were pushed below Gather node as part of target list whereas in
the plan6, it seems some unsafe node is pushed below Gather node. It will
be helpful if you can share the offending query?

plan6 corresponds to this query:

select
pg_catalog.anyarray_out(
cast((select most_common_vals from pg_catalog.pg_stats limit 1 offset 41)
as anyarray)) as c0
from
public.quad_point_tbl as ref_0 where ref_0.p ~= ref_0.p;

regards,
Andreas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Andreas Seltenreich (#5)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

On Mon, May 23, 2016 at 4:48 PM, Andreas Seltenreich <seltenreich@gmx.de>
wrote:

Amit Kapila writes:

Earlier problems were due to the reason that some unsafe/restricted
expressions were pushed below Gather node as part of target list

whereas in

the plan6, it seems some unsafe node is pushed below Gather node. It

will

be helpful if you can share the offending query?

plan6 corresponds to this query:

select
pg_catalog.anyarray_out(
cast((select most_common_vals from pg_catalog.pg_stats limit 1 offset

41)

as anyarray)) as c0
from
public.quad_point_tbl as ref_0 where ref_0.p ~= ref_0.p;

Are you sure that the core dumps you are seeing are due to plan6? I have
tried to generate a parallel plan for above query and it seems to me that
after applying the patches (avoid_restricted_clause_below_gather_v1.patch
and prohibit_parallel_clause_below_rel_v1.patch), the plan it generates
doesn't have subplan below gather node [1]QUERY PLAN. Without patch
avoid_restricted_clause_below_gather_v1.patch, it will allow to push
subplan below gather node, so I think either there is some other plan
(query) due to which you are seeing core dumps or the above two patches
haven't been applied before testing. Is it possible that core dump is due
to plan2 or some other similar plan (I am not sure at this stage about the
cause of the problem you are seeing, but if due to some reason PARAM_EXEC
params are pushed below gather, then such a plan might not work)? If you
think plan other than plan6 can cause such a problem, then can you share
the query for plan2?

[1]: QUERY PLAN
QUERY PLAN

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
------------------------------------
Gather (cost=543.59..5041.17 rows=1761 width=32)
Output: anyarray_out($0)
Workers Planned: 1
InitPlan 1 (returns $0)
-> Limit (cost=491.56..543.59 rows=1 width=32)
Output: pg_stats.most_common_vals
-> Subquery Scan on pg_stats (cost=127.30..491.56 rows=7
width=32)
Output: pg_stats.most_common_vals
-> Hash Join (cost=127.30..491.49 rows=7 width=401)
Output: NULL::name, NULL::name, NULL::name,
NULL::boolean, NULL::real, NULL::integer, NULL::real, CASE WHEN (s.stakind1
= 1) THEN s.stavalues1 WHEN (s.stakind2 = 1) THEN s.stavalues2 WHEN
(s.stakind3 = 1) THEN s.stavalues3 WHEN (s.stakind4 = 1) THEN s.stavalues4
WHEN (s.stakind5 = 1) THEN s.stavalues5 ELSE NULL::anyarray END,
NULL::real[], NULL::anyarray, NULL::real, NULL::anyarray, NULL::real[],
NULL::real[]
Hash Cond: ((a.attrelid = c.oid) AND (a.attnum =
s.staattnum))
Join Filter: has_column_privilege(c.oid, a.attnum,
'select'::text)
-> Seq Scan on pg_catalog.pg_attribute a
(cost=0.00..318.11 rows=6097 width=6)
Output: a.attrelid, a.attname, a.atttypid,
a.attstattarget, a.attlen, a.attnum, a.attndims, a.attcacheoff,
a.atttypmod, a.attbyval, a.attstorage, a.attalign, a.attnotnull,
a.atthasdef, a.attisdropped, a.attislocal, a.attinhcount, a.attcollation,
a.attacl, a.attoptions, a.attfdwoptions
Filter: (NOT a.attisdropped)
-> Hash (cost=117.59..117.59 rows=647 width=180)
Output: s.stakind1, s.stavalues1, s.stakind2,
s.stavalues2, s.stakind3, s.stavalues3, s.stakind4, s.stavalues4,
s.stakind5, s.stava
lues5, s.starelid, s.staattnum, c.oid
-> Hash Join (cost=57.21..117.59 rows=647
width=180)
Output: s.stakind1, s.stavalues1,
s.stakind2, s.stavalues2, s.stakind3, s.stavalues3, s.stakind4,
s.stavalues4, s.stakind5, s
.stavalues5, s.starelid, s.staattnum, c.oid
Hash Cond: (s.starelid = c.oid)
-> Seq Scan on pg_catalog.pg_statistic
s (cost=0.00..51.48 rows=648 width=176)
Output: s.starelid, s.staattnum,
s.stainherit, s.stanullfrac, s.stawidth, s.stadistinct, s.stakind1,
s.stakind2, s.stakind3, s.stakind4, s.stakind5, s.staop1, s.staop2,
s.staop3, s.staop4, s.staop5,s.stanumbers1, s.stanumbers2, s.stanumbers3,
s.stanumbers4, s.stanumbers5, s.sta
values1, s.stavalues2, s.stavalues3, s.stavalues4, s.stavalues5
-> Hash (cost=45.61..45.61 rows=928
width=8)
Output: c.oid, c.relnamespace
-> Seq Scan on
pg_catalog.pg_class c (cost=0.00..45.61 rows=928 width=8)
Output: c.oid, c.relnamespace
Filter: ((NOT
c.relrowsecurity) OR (NOT row_security_active(c.oid)))
-> Parallel Seq Scan on public.quad_point_tbl ref_0
(cost=0.00..4493.18 rows=1036 width=0)
Filter: (ref_0.p ~= ref_0.p)
(29 rows)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Andreas Seltenreich

seltenreich@gmx.de

over 9 years ago

In reply to: Amit Kapila (#6)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

Amit Kapila writes:

On Mon, May 23, 2016 at 4:48 PM, Andreas Seltenreich <seltenreich@gmx.de>
wrote:

plan6 corresponds to this query:

Are you sure that the core dumps you are seeing are due to plan6?

Each of the plans sent was harvested from a controlling process when the
above assertion failed in its workers. I do not know whether the plans
themselves really are at fault, as most of the collected plans look ok
to me. The backtrace in the controlling process always look like the
one reported. (Except when the coredumping took so long as to trigger a
statement_timeout in the still-running master. There are no
plans/queries available in this case, as the the state is no longer
available in an aborted transaction.)

I have tried to generate a parallel plan for above query and it seems to me that
after applying the patches (avoid_restricted_clause_below_gather_v1.patch
and prohibit_parallel_clause_below_rel_v1.patch), the plan it generates
doesn't have subplan below gather node [1].

Without patch avoid_restricted_clause_below_gather_v1.patch, it will allow to push
subplan below gather node, so I think either there is some other plan
(query) due to which you are seeing core dumps or the above two patches
haven't been applied before testing.

According to my notes, the patches were applied in the instance that
crashed. The fact that I do not see the other variants of the crashes
the patches fix anymore, and the probability for this failed assertion
per random query is reduced by about a factor of 20 in contrast to
testing with the patches not applied, I'm pretty certain that this is
not a bookkeeping error on my part.

Is it possible that core dump is due to plan2 or some other similar
plan (I am not sure at this stage about the cause of the problem you
are seeing, but if due to some reason PARAM_EXEC params are pushed
below gather, then such a plan might not work)? If you think plan
other than plan6 can cause such a problem, then can you share the
query for plan2?

Each of the sent plans was collected when a worker dumped core due to
the failed assertion. More core dumps than plans were actually
observed, since with this failed assertion, multiple workers usually
trip on and dump core simultaneously.

The following query corresponds to plan2:

--8<---------------cut here---------------start------------->8---
select
pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0,
subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4,
(select unique1 from public.bprime limit 1 offset 9) as c5,
subq_0.c2 as c6
from
(select ref_0.tablename as c0, ref_0.inherited as c1,
ref_0.histogram_bounds as c2, 100 as c3
from
pg_catalog.pg_stats as ref_0
where 49 is not NULL limit 55) as subq_0
where true
limit 58;
--8<---------------cut here---------------end--------------->8---

regards,
Andreas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Andreas Seltenreich (#7)

1 attachment(s)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

On Tue, May 24, 2016 at 6:36 PM, Andreas Seltenreich <seltenreich@gmx.de>
wrote:

Each of the sent plans was collected when a worker dumped core due to
the failed assertion. More core dumps than plans were actually
observed, since with this failed assertion, multiple workers usually
trip on and dump core simultaneously.

The following query corresponds to plan2:

--8<---------------cut here---------------start------------->8---
select
pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0,
subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4,
(select unique1 from public.bprime limit 1 offset 9) as c5,
subq_0.c2 as c6
from
(select ref_0.tablename as c0, ref_0.inherited as c1,
ref_0.histogram_bounds as c2, 100 as c3
from
pg_catalog.pg_stats as ref_0
where 49 is not NULL limit 55) as subq_0
where true
limit 58;
--8<---------------cut here---------------end--------------->8---

I am able to reproduce the assertion (it occurs once in two to three times,
but always at same place) you have reported upthread with the above query.
It seems to me, issue here is that while workers are writing tuples in the
tuple queue, the master backend has detached from the queues. The reason
why master backend has detached from tuple queues is because of Limit
clause, basically after processing required tuples as specified by Limit
clause, it calls shutdown of nodes in below part of code:

ExecutePlan()
{
..
if (TupIsNull(slot))
{
/* Allow nodes to release or shut down resources. */
(void) ExecShutdownNode(planstate);
break;
}
..
}

Now, it is quite possible that the worker has written part of it's data,
after which the queue got detached. The callers of shm_mq
(tqueueReceiveSlot/shm_mq_send) doesn't handle SHM_MQ_DETACHED due to which
it keeps on sending more data (next tuple) which leads to the assertion in
below code:
shm_mq_sendv()
{
..
/* Write the actual data bytes into the buffer. */
Assert(mqh->mqh_partial_bytes <= nbytes);
..
}

I think the workers should stop processing tuples after the tuple queues
got detached. This will not only handle above situation gracefully, but
will allow to speed up the queries where Limit clause is present on top of
Gather node. Patch for the same is attached with mail (this was part of
original parallel seq scan patch, but was not applied and the reason as far
as I remember was we thought such an optimization might not be required for
initial version).

Another approach to fix this issue could be to reset mqh_partial_bytes and
mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED.
These are currently reset only incase of success.

I have added this issue in list of PostgreSQL 9.6 open items @
https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items

The steps to reproduce it on regression database is:
1. We need to create enough rows in pg_statistic, so that parallel plan can
be selected.
2. Used below procedure to ensure sufficient rows gets created in
pg_statistic
CREATE OR REPLACE FUNCTION populate_pg_stat() RETURNS int AS
$BODY$
DECLARE
count int;
BEGIN
FOR count IN 1 .. 50000
LOOP
Execute 'Create table t' || count || '(c1 int)';
Execute 'insert into t' || count || ' values (generate_series(1,10))';
Execute 'Analyze t' || count;
End Loop;
Return 1;
END
$BODY$
LANGUAGE plpgsql;
3.
set parallel_tuple_cost = 0
set parallel_setup_cost = 0
execute query -
Explain Analyze select
pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0,
subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4,
(select unique1 from public.bprime limit 1 offset 9) as c5,
subq_0.c2 as c6
from
(select ref_0.tablename as c0, ref_0.inherited as c1,
ref_0.histogram_bounds as c2, 100 as c3
from
pg_catalog.pg_stats as ref_0
where 49 is not NULL limit 55) as subq_0
where true
limit 58;

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

stop_processing_tuples_detached_queue_v1.patchapplication/octet-stream; name=stop_processing_tuples_detached_queue_v1.patchDownload

diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index 1939ff5..d9664aa 100644
--- a/src/backend/access/common/printtup.c
+++ b/src/backend/access/common/printtup.c
@@ -26,9 +26,9 @@
 
 static void printtup_startup(DestReceiver *self, int operation,
 				 TupleDesc typeinfo);
-static void printtup(TupleTableSlot *slot, DestReceiver *self);
-static void printtup_20(TupleTableSlot *slot, DestReceiver *self);
-static void printtup_internal_20(TupleTableSlot *slot, DestReceiver *self);
+static bool printtup(TupleTableSlot *slot, DestReceiver *self);
+static bool printtup_20(TupleTableSlot *slot, DestReceiver *self);
+static bool printtup_internal_20(TupleTableSlot *slot, DestReceiver *self);
 static void printtup_shutdown(DestReceiver *self);
 static void printtup_destroy(DestReceiver *self);
 
@@ -299,7 +299,7 @@ printtup_prepare_info(DR_printtup *myState, TupleDesc typeinfo, int numAttrs)
  *		printtup --- print a tuple in protocol 3.0
  * ----------------
  */
-static void
+static bool
 printtup(TupleTableSlot *slot, DestReceiver *self)
 {
 	TupleDesc	typeinfo = slot->tts_tupleDescriptor;
@@ -376,13 +376,15 @@ printtup(TupleTableSlot *slot, DestReceiver *self)
 	/* Return to caller's context, and flush row's temporary memory */
 	MemoryContextSwitchTo(oldcontext);
 	MemoryContextReset(myState->tmpcontext);
+
+	return true;
 }
 
 /* ----------------
  *		printtup_20 --- print a tuple in protocol 2.0
  * ----------------
  */
-static void
+static bool
 printtup_20(TupleTableSlot *slot, DestReceiver *self)
 {
 	TupleDesc	typeinfo = slot->tts_tupleDescriptor;
@@ -452,6 +454,8 @@ printtup_20(TupleTableSlot *slot, DestReceiver *self)
 	/* Return to caller's context, and flush row's temporary memory */
 	MemoryContextSwitchTo(oldcontext);
 	MemoryContextReset(myState->tmpcontext);
+
+	return true;
 }
 
 /* ----------------
@@ -528,7 +532,7 @@ debugStartup(DestReceiver *self, int operation, TupleDesc typeinfo)
  *		debugtup - print one tuple for an interactive backend
  * ----------------
  */
-void
+bool
 debugtup(TupleTableSlot *slot, DestReceiver *self)
 {
 	TupleDesc	typeinfo = slot->tts_tupleDescriptor;
@@ -553,6 +557,8 @@ debugtup(TupleTableSlot *slot, DestReceiver *self)
 		printatt((unsigned) i + 1, typeinfo->attrs[i], value);
 	}
 	printf("\t----\n");
+
+	return true;
 }
 
 /* ----------------
@@ -564,7 +570,7 @@ debugtup(TupleTableSlot *slot, DestReceiver *self)
  * This is largely same as printtup_20, except we use binary formatting.
  * ----------------
  */
-static void
+static bool
 printtup_internal_20(TupleTableSlot *slot, DestReceiver *self)
 {
 	TupleDesc	typeinfo = slot->tts_tupleDescriptor;
@@ -636,4 +642,6 @@ printtup_internal_20(TupleTableSlot *slot, DestReceiver *self)
 	/* Return to caller's context, and flush row's temporary memory */
 	MemoryContextSwitchTo(oldcontext);
 	MemoryContextReset(myState->tmpcontext);
+
+	return true;
 }
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3201476..28dcd34 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -4454,7 +4454,7 @@ copy_dest_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 /*
  * copy_dest_receive --- receive one tuple
  */
-static void
+static bool
 copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_copy    *myState = (DR_copy *) self;
@@ -4466,6 +4466,8 @@ copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
 	/* And send the data */
 	CopyOneRowTo(cstate, InvalidOid, slot->tts_values, slot->tts_isnull);
 	myState->processed++;
+
+	return true;
 }
 
 /*
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index cb7a145..5a853c4 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -62,7 +62,7 @@ typedef struct
 static ObjectAddress CreateAsReladdr = {InvalidOid, InvalidOid, 0};
 
 static void intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo);
-static void intorel_receive(TupleTableSlot *slot, DestReceiver *self);
+static bool intorel_receive(TupleTableSlot *slot, DestReceiver *self);
 static void intorel_shutdown(DestReceiver *self);
 static void intorel_destroy(DestReceiver *self);
 
@@ -482,7 +482,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 /*
  * intorel_receive --- receive one tuple
  */
-static void
+static bool
 intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
@@ -507,6 +507,8 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 				myState->bistate);
 
 	/* We know this is a newly created relation, so there are no indexes */
+
+	return true;
 }
 
 /*
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index f00aab3..62e61a2 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -56,7 +56,7 @@ typedef struct
 static int	matview_maintenance_depth = 0;
 
 static void transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo);
-static void transientrel_receive(TupleTableSlot *slot, DestReceiver *self);
+static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self);
 static void transientrel_shutdown(DestReceiver *self);
 static void transientrel_destroy(DestReceiver *self);
 static void refresh_matview_datafill(DestReceiver *dest, Query *query,
@@ -467,7 +467,7 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 /*
  * transientrel_receive --- receive one tuple
  */
-static void
+static bool
 transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
@@ -486,6 +486,8 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 				myState->bistate);
 
 	/* We know this is a newly created relation, so there are no indexes */
+
+	return true;
 }
 
 /*
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ac02304..b5ced38 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1593,7 +1593,15 @@ ExecutePlan(EState *estate,
 		 * practice, this is probably always the case at this point.)
 		 */
 		if (sendTuples)
-			(*dest->receiveSlot) (slot, dest);
+		{
+			/*
+			 * If we are not able to send the tuple, we assume the destination
+			 * has closed and no more tuples can be sent. If that's the case,
+			 * end the loop.
+			 */
+			if (!((*dest->receiveSlot) (slot, dest)))
+				break;
+		}
 
 		/*
 		 * Count tuples processed, if this is a SELECT.  (For other operation
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index dfe5622..0b4505c 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1313,7 +1313,7 @@ do_tup_output(TupOutputState *tstate, Datum *values, bool *isnull)
 	ExecStoreVirtualTuple(slot);
 
 	/* send the tuple to the receiver */
-	(*tstate->dest->receiveSlot) (slot, tstate->dest);
+	(void) (*tstate->dest->receiveSlot) (slot, tstate->dest);
 
 	/* clean up */
 	ExecClearTuple(slot);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 6e14c9d..cd93c04 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -167,7 +167,7 @@ static Datum postquel_get_single_result(TupleTableSlot *slot,
 static void sql_exec_error_callback(void *arg);
 static void ShutdownSQLFunction(Datum arg);
 static void sqlfunction_startup(DestReceiver *self, int operation, TupleDesc typeinfo);
-static void sqlfunction_receive(TupleTableSlot *slot, DestReceiver *self);
+static bool sqlfunction_receive(TupleTableSlot *slot, DestReceiver *self);
 static void sqlfunction_shutdown(DestReceiver *self);
 static void sqlfunction_destroy(DestReceiver *self);
 
@@ -1904,7 +1904,7 @@ sqlfunction_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 /*
  * sqlfunction_receive --- receive one tuple
  */
-static void
+static bool
 sqlfunction_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_sqlfunction *myState = (DR_sqlfunction *) self;
@@ -1914,6 +1914,8 @@ sqlfunction_receive(TupleTableSlot *slot, DestReceiver *self)
 
 	/* Store the filtered tuple into the tuplestore */
 	tuplestore_puttupleslot(myState->tstore, slot);
+
+	return true;
 }
 
 /*
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 23cb6f4..7ccabdb 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1774,7 +1774,7 @@ spi_dest_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
  *		store tuple retrieved by Executor into SPITupleTable
  *		of current SPI procedure
  */
-void
+bool
 spi_printtup(TupleTableSlot *slot, DestReceiver *self)
 {
 	SPITupleTable *tuptable;
@@ -1809,6 +1809,8 @@ spi_printtup(TupleTableSlot *slot, DestReceiver *self)
 	(tuptable->free)--;
 
 	MemoryContextSwitchTo(oldcxt);
+
+	return true;
 }
 
 /*
diff --git a/src/backend/executor/tqueue.c b/src/backend/executor/tqueue.c
index 383b535..8abb1f1 100644
--- a/src/backend/executor/tqueue.c
+++ b/src/backend/executor/tqueue.c
@@ -115,12 +115,13 @@ static RemapInfo *BuildRemapInfo(TupleDesc tupledesc);
  * type over a range type over a range type over an array type over a record,
  * or something like that.
  */
-static void
+static bool
 tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
 {
 	TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
 	TupleDesc	tupledesc = slot->tts_tupleDescriptor;
 	HeapTuple	tuple;
+	shm_mq_result result;
 
 	/*
 	 * Test to see whether the tupledesc has changed; if so, set up for the
@@ -195,7 +196,16 @@ tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
 	}
 
 	/* Send the tuple itself. */
-	shm_mq_send(tqueue->handle, tuple->t_len, tuple->t_data, false);
+	result = shm_mq_send(tqueue->handle, tuple->t_len, tuple->t_data, false);
+
+	if (result == SHM_MQ_DETACHED)
+		return false;
+	else if (result != SHM_MQ_SUCCESS)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("unable to send tuples")));
+
+	return true;
 }
 
 /*
diff --git a/src/backend/executor/tstoreReceiver.c b/src/backend/executor/tstoreReceiver.c
index 516440a..8f1e1b3 100644
--- a/src/backend/executor/tstoreReceiver.c
+++ b/src/backend/executor/tstoreReceiver.c
@@ -37,8 +37,8 @@ typedef struct
 } TStoreState;
 
 
-static void tstoreReceiveSlot_notoast(TupleTableSlot *slot, DestReceiver *self);
-static void tstoreReceiveSlot_detoast(TupleTableSlot *slot, DestReceiver *self);
+static bool tstoreReceiveSlot_notoast(TupleTableSlot *slot, DestReceiver *self);
+static bool tstoreReceiveSlot_detoast(TupleTableSlot *slot, DestReceiver *self);
 
 
 /*
@@ -90,19 +90,21 @@ tstoreStartupReceiver(DestReceiver *self, int operation, TupleDesc typeinfo)
  * Receive a tuple from the executor and store it in the tuplestore.
  * This is for the easy case where we don't have to detoast.
  */
-static void
+static bool
 tstoreReceiveSlot_notoast(TupleTableSlot *slot, DestReceiver *self)
 {
 	TStoreState *myState = (TStoreState *) self;
 
 	tuplestore_puttupleslot(myState->tstore, slot);
+
+	return true;
 }
 
 /*
  * Receive a tuple from the executor and store it in the tuplestore.
  * This is for the case where we have to detoast any toasted values.
  */
-static void
+static bool
 tstoreReceiveSlot_detoast(TupleTableSlot *slot, DestReceiver *self)
 {
 	TStoreState *myState = (TStoreState *) self;
@@ -152,6 +154,8 @@ tstoreReceiveSlot_detoast(TupleTableSlot *slot, DestReceiver *self)
 	/* And release any temporary detoasted values */
 	for (i = 0; i < nfree; i++)
 		pfree(DatumGetPointer(myState->tofree[i]));
+
+	return true;
 }
 
 /*
diff --git a/src/backend/tcop/dest.c b/src/backend/tcop/dest.c
index 2c7dc6e..de45cbc 100644
--- a/src/backend/tcop/dest.c
+++ b/src/backend/tcop/dest.c
@@ -45,9 +45,10 @@
  *		dummy DestReceiver functions
  * ----------------
  */
-static void
+static bool
 donothingReceive(TupleTableSlot *slot, DestReceiver *self)
 {
+	return true;
 }
 
 static void
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fcdc4c3..3f6cb12 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -1109,7 +1109,13 @@ RunFromStore(Portal portal, ScanDirection direction, uint64 count,
 			if (!ok)
 				break;
 
-			(*dest->receiveSlot) (slot, dest);
+			/*
+			 * If we are not able to send the tuple, we assume the destination
+			 * has closed and no more tuples can be sent. If that's the case,
+			 * end the loop.
+			 */
+			if (!((*dest->receiveSlot) (slot, dest)))
+				break;
 
 			ExecClearTuple(slot);
 
diff --git a/src/include/access/printtup.h b/src/include/access/printtup.h
index 64dde01..608c564 100644
--- a/src/include/access/printtup.h
+++ b/src/include/access/printtup.h
@@ -25,11 +25,11 @@ extern void SendRowDescriptionMessage(TupleDesc typeinfo, List *targetlist,
 
 extern void debugStartup(DestReceiver *self, int operation,
 			 TupleDesc typeinfo);
-extern void debugtup(TupleTableSlot *slot, DestReceiver *self);
+extern bool debugtup(TupleTableSlot *slot, DestReceiver *self);
 
 /* XXX these are really in executor/spi.c */
 extern void spi_dest_startup(DestReceiver *self, int operation,
 				 TupleDesc typeinfo);
-extern void spi_printtup(TupleTableSlot *slot, DestReceiver *self);
+extern bool spi_printtup(TupleTableSlot *slot, DestReceiver *self);
 
 #endif   /* PRINTTUP_H */
diff --git a/src/include/tcop/dest.h b/src/include/tcop/dest.h
index 4e42d61..dd80433 100644
--- a/src/include/tcop/dest.h
+++ b/src/include/tcop/dest.h
@@ -104,7 +104,9 @@ typedef enum
  *		pointers that the executor must call.
  *
  * Note: the receiveSlot routine must be passed a slot containing a TupleDesc
- * identical to the one given to the rStartup routine.
+ * identical to the one given to the rStartup routine.  It returns bool where
+ * a "true" value means "continue processing" and a "false" value means
+ * "stop early, just as if we'd reached the end of the scan".
  * ----------------
  */
 typedef struct _DestReceiver DestReceiver;
@@ -112,7 +114,7 @@ typedef struct _DestReceiver DestReceiver;
 struct _DestReceiver
 {
 	/* Called for each tuple to be output: */
-	void		(*receiveSlot) (TupleTableSlot *slot,
+	bool		(*receiveSlot) (TupleTableSlot *slot,
 											DestReceiver *self);
 	/* Per-executor-run initialization and shutdown: */
 	void		(*rStartup) (DestReceiver *self,

Noah Misch

noah@leadboat.com

over 9 years ago

In reply to: Amit Kapila (#8)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

On Thu, May 26, 2016 at 03:27:40PM +0530, Amit Kapila wrote:

I think the workers should stop processing tuples after the tuple queues
got detached. This will not only handle above situation gracefully, but
will allow to speed up the queries where Limit clause is present on top of
Gather node. Patch for the same is attached with mail (this was part of
original parallel seq scan patch, but was not applied and the reason as far
as I remember was we thought such an optimization might not be required for
initial version).

Another approach to fix this issue could be to reset mqh_partial_bytes and
mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED.
These are currently reset only incase of success.

[Action required within 72 hours. This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20160527025039.GA447393@tornado.leadboat.com and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.

[1]: /messages/by-id/20160527025039.GA447393@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Amit Kapila (#8)

1 attachment(s)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

On Thu, May 26, 2016 at 5:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, May 24, 2016 at 6:36 PM, Andreas Seltenreich <seltenreich@gmx.de>
wrote:

Each of the sent plans was collected when a worker dumped core due to
the failed assertion. More core dumps than plans were actually
observed, since with this failed assertion, multiple workers usually
trip on and dump core simultaneously.

The following query corresponds to plan2:

--8<---------------cut here---------------start------------->8---
select
pg_catalog.pg_stat_get_bgwriter_requested_checkpoints() as c0,
subq_0.c3 as c1, subq_0.c1 as c2, 31 as c3, 18 as c4,
(select unique1 from public.bprime limit 1 offset 9) as c5,
subq_0.c2 as c6
from
(select ref_0.tablename as c0, ref_0.inherited as c1,
ref_0.histogram_bounds as c2, 100 as c3
from
pg_catalog.pg_stats as ref_0
where 49 is not NULL limit 55) as subq_0
where true
limit 58;
--8<---------------cut here---------------end--------------->8---

I am able to reproduce the assertion (it occurs once in two to three times,
but always at same place) you have reported upthread with the above query.
It seems to me, issue here is that while workers are writing tuples in the
tuple queue, the master backend has detached from the queues. The reason
why master backend has detached from tuple queues is because of Limit
clause, basically after processing required tuples as specified by Limit
clause, it calls shutdown of nodes in below part of code:

I can't reproduce this assertion failure on master. I tried running
'make installcheck' and then running this query repeatedly in the
regression database with and without
parallel_setup_cost=parallel_tuple_cost=0, and got nowhere. Does that
work for you, or do you have some other set of steps?

I think the workers should stop processing tuples after the tuple queues got
detached. This will not only handle above situation gracefully, but will
allow to speed up the queries where Limit clause is present on top of Gather
node. Patch for the same is attached with mail (this was part of original
parallel seq scan patch, but was not applied and the reason as far as I
remember was we thought such an optimization might not be required for
initial version).

This is very likely a good idea, but...

Another approach to fix this issue could be to reset mqh_partial_bytes and
mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED. These
are currently reset only incase of success.

...I think we should do this too, because it's intended that calling
shm_mq_sendv again after it previously returned SHM_MQ_DETACHED should
again return SHM_MQ_DETACHED, not fail an assertion. Can you see
whether the attached patch fixes this for you?

(Status update for Noah: I will provide another update regarding this
issue no later than Monday COB, US time. I assume that Amit will have
responded by then, and it should hopefully be clear what the next step
is at that point.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

dont-fail-mq-assert-v1.patchbinary/octet-stream; name=dont-fail-mq-assert-v1.patchDownload

diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index 7d1c9cd..5ed0cef 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -366,9 +366,12 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 		res = shm_mq_send_bytes(mqh, sizeof(Size) - mqh->mqh_partial_bytes,
 								((char *) &nbytes) +mqh->mqh_partial_bytes,
 								nowait, &bytes_written);
-		mqh->mqh_partial_bytes += bytes_written;
-		if (res != SHM_MQ_SUCCESS)
+		if (res == SHM_MQ_DETACHED)
+		{
+			mqh->mqh_partial_bytes = 0;
 			return res;
+		}
+		mqh->mqh_partial_bytes += bytes_written;
 
 		if (mqh->mqh_partial_bytes >= sizeof(Size))
 		{
@@ -378,6 +381,9 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 			mqh->mqh_length_word_complete = true;
 		}
 
+		if (res != SHM_MQ_SUCCESS)
+			return res;
+
 		/* Length word can't be split unless bigger than required alignment. */
 		Assert(mqh->mqh_length_word_complete || sizeof(Size) > MAXIMUM_ALIGNOF);
 	}
@@ -434,6 +440,11 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 			}
 			res = shm_mq_send_bytes(mqh, j, tmpbuf, nowait, &bytes_written);
 			mqh->mqh_partial_bytes += bytes_written;
+			if (res == SHM_MQ_DETACHED)
+			{
+				mqh->mqh_partial_bytes = 0;
+				return res;
+			}
 			if (res != SHM_MQ_SUCCESS)
 				return res;
 			continue;
@@ -449,6 +460,11 @@ shm_mq_sendv(shm_mq_handle *mqh, shm_mq_iovec *iov, int iovcnt, bool nowait)
 			chunksize = MAXALIGN_DOWN(chunksize);
 		res = shm_mq_send_bytes(mqh, chunksize, &iov[which_iov].data[offset],
 								nowait, &bytes_written);
+		if (res == SHM_MQ_DETACHED)
+		{
+			mqh->mqh_partial_bytes = 0;
+			return res;
+		}
 		mqh->mqh_partial_bytes += bytes_written;
 		offset += bytes_written;
 		if (res != SHM_MQ_SUCCESS)

#11

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Robert Haas (#10)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

On Sat, Jun 4, 2016 at 8:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 26, 2016 at 5:57 AM, Amit Kapila <amit.kapila16@gmail.com>

wrote:

I am able to reproduce the assertion (it occurs once in two to three

times,

but always at same place) you have reported upthread with the above

query.

It seems to me, issue here is that while workers are writing tuples in

the

tuple queue, the master backend has detached from the queues. The

reason

why master backend has detached from tuple queues is because of Limit
clause, basically after processing required tuples as specified by Limit
clause, it calls shutdown of nodes in below part of code:

I can't reproduce this assertion failure on master. I tried running
'make installcheck' and then running this query repeatedly in the
regression database with and without
parallel_setup_cost=parallel_tuple_cost=0, and got nowhere. Does that
work for you, or do you have some other set of steps?

I have tried by populating pg_statistic table after running make
installcheck. The way to populate pg_statistic is to create lot of tables
and insert few rows in each table as mentioned in the end of mail upthread.
/messages/by-id/CAA4eK1KOKGqmz9bGu+Z42qhRwMbm4R5rfnqsLCNqFs9j14jzEA@mail.gmail.com

Today, again I tried reproducing it using same steps, but could not
reproduce it. This is a timing issue and difficult to reproduce, last time
also I have spent quite some time to reproduce it. I think we can fix the
issue as per analysis done by me last time and then let Andreas run his
tests to see if he could see the issue again.

I think the workers should stop processing tuples after the tuple

queues got

detached. This will not only handle above situation gracefully, but

will

allow to speed up the queries where Limit clause is present on top of

Gather

node. Patch for the same is attached with mail (this was part of

original

parallel seq scan patch, but was not applied and the reason as far as I
remember was we thought such an optimization might not be required for
initial version).

This is very likely a good idea, but...

Another approach to fix this issue could be to reset mqh_partial_bytes

and

mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED.

These

are currently reset only incase of success.

...I think we should do this too, because it's intended that calling
shm_mq_sendv again after it previously returned SHM_MQ_DETACHED should
again return SHM_MQ_DETACHED, not fail an assertion.

  res = shm_mq_send_bytes(mqh, j, tmpbuf, nowait, &bytes_written);
  mqh->mqh_partial_bytes += bytes_written;
+ if (res == SHM_MQ_DETACHED)
+ {
+ mqh->mqh_partial_bytes = 0;
+ return res;
+ }

In the above change, you are first adding bytes_written and then doing the
SHM_MQ_DETACHED check, whereas other place the check is done first which
seems to be right. I think it doesn't matter either way, but it is better
to be consistent. Also isn't it better to set mqh_length_word_complete as
false as next time this API is called, it should first try to write length
into buffer.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#12

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Amit Kapila (#11)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

On Sat, Jun 4, 2016 at 2:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

In the above change, you are first adding bytes_written and then doing the
SHM_MQ_DETACHED check, whereas other place the check is done first which
seems to be right. I think it doesn't matter either way, but it is better
to be consistent. Also isn't it better to set mqh_length_word_complete as
false as next time this API is called, it should first try to write length
into buffer.

OK, committed after (I hope) fixing those issues.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Robert Haas (#10)

Re: [sqlsmith] Failed assertions on parallel worker shutdown

On Fri, Jun 3, 2016 at 11:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think the workers should stop processing tuples after the tuple queues got
detached. This will not only handle above situation gracefully, but will
allow to speed up the queries where Limit clause is present on top of Gather
node. Patch for the same is attached with mail (this was part of original
parallel seq scan patch, but was not applied and the reason as far as I
remember was we thought such an optimization might not be required for
initial version).

This is very likely a good idea, but...

Another approach to fix this issue could be to reset mqh_partial_bytes and
mqh_length_word_complete in shm_mq_sendv in case of SHM_MQ_DETACHED. These
are currently reset only incase of success.

...I think we should do this too

Patches for both of these things are now committed. Barring problems
with those commits, I believe this resolves this open item.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers