[sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

Started by Andreas Seltenreichalmost 10 years ago24 messageshackers
Jump to latest
#1Andreas Seltenreich
seltenreich@gmx.de

Hi,

when fuzz testing master as of c1543a8, parallel workers trigger the
following assertion in ExecInitSubPlan every couple hours.

TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File: "list.c", Line: 390)

Sample backtraces of a worker and leader below, plan of leader attached.
The collected queries don't seem to reproduce it. Curiously, running
explain on them on the failed instance after crash recovery never shows
any gather nodes…

regards,
andreas

Core was generated by `postgres: bgworker: parallel worker for PID 28062 '.
Program terminated with signal SIGABRT, Aborted.
(gdb) bt
#3 0x000000000061bad2 in list_nth_cell (list=0x0, n=<optimized out>) at list.c:390
#4 0x000000000061bb26 in list_nth (list=<optimized out>, n=<optimized out>) at list.c:413
#5 0x00000000005fe566 in ExecInitSubPlan (subplan=subplan@entry=0x1522a08, parent=parent@entry=0x1538188) at nodeSubplan.c:705
#6 0x00000000005e3b54 in ExecInitExpr (node=0x1522a08, parent=parent@entry=0x1538188) at execQual.c:4724
#7 0x00000000005e415c in ExecInitExpr (node=<optimized out>, parent=parent@entry=0x1538188) at execQual.c:5164
#8 0x00000000005ff3fc in ExecInitAlternativeSubPlan (asplan=asplan@entry=0x1522060, parent=parent@entry=0x1538188) at nodeSubplan.c:1185
#9 0x00000000005e35c4 in ExecInitExpr (node=0x1522060, parent=parent@entry=0x1538188) at execQual.c:4740
#10 0x00000000005e3f8a in ExecInitExpr (node=0x1522978, parent=parent@entry=0x1538188) at execQual.c:4845
#11 0x00000000005e415c in ExecInitExpr (node=<optimized out>, parent=parent@entry=0x1538188) at execQual.c:5164
#12 0x00000000005e3687 in ExecInitExpr (node=0x1522920, parent=parent@entry=0x1538188) at execQual.c:4648
#13 0x00000000005e330f in ExecInitExpr (node=0x15228c8, parent=parent@entry=0x1538188) at execQual.c:5132
#14 0x00000000005e326f in ExecInitExpr (node=0x1522870, parent=parent@entry=0x1538188) at execQual.c:5152
#15 0x00000000005e415c in ExecInitExpr (node=<optimized out>, parent=parent@entry=0x1538188) at execQual.c:5164
#16 0x00000000005fbb62 in ExecInitSeqScan (node=0x1522728, estate=0x15379b8, eflags=16) at nodeSeqscan.c:192
#17 0x00000000005dd567 in ExecInitNode (node=0x1522728, estate=estate@entry=0x15379b8, eflags=eflags@entry=16) at execProcnode.c:192
#18 0x00000000005f12a5 in ExecInitHashJoin (node=0x1522530, estate=0x15379b8, eflags=16) at nodeHashjoin.c:489
#19 0x00000000005dd497 in ExecInitNode (node=node@entry=0x1522530, estate=estate@entry=0x15379b8, eflags=eflags@entry=16) at execProcnode.c:275
#20 0x00000000005dae6c in InitPlan (eflags=16, queryDesc=<optimized out>) at execMain.c:959
#21 standard_ExecutorStart (queryDesc=<optimized out>, eflags=16) at execMain.c:238
#22 0x00000000005dcac4 in ParallelQueryMain (seg=<optimized out>, toc=0x7f442d27b000) at execParallel.c:729
#23 0x00000000004e631b in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1033
#24 0x0000000000683af2 in StartBackgroundWorker () at bgworker.c:726
#25 0x000000000068ec32 in do_start_bgworker (rw=0x14c4a20) at postmaster.c:5531
#26 maybe_start_bgworker () at postmaster.c:5706
#27 0x000000000068f686 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4967
#28 <signal handler called>
#29 0x00007f442c839ac3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
#30 0x000000000046c144 in ServerLoop () at postmaster.c:1654
#31 0x0000000000690aae in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x14a1560) at postmaster.c:1298
#32 0x000000000046d78d in main (argc=4, argv=0x14a1560) at main.c:228
(gdb) frame 5
#5 0x00000000005fe566 in ExecInitSubPlan (subplan=subplan@entry=0x1522a08, parent=parent@entry=0x1538188) at nodeSubplan.c:705
(gdb) list
704 /* Link the SubPlanState to already-initialized subplan */
705 sstate->planstate = (PlanState *) list_nth(estate->es_subplanstates,
706 subplan->plan_id - 1);

(gdb) attach 28062
Attaching to program: /home/smith/postgres/inst/master/bin/postgres, process 28062
(gdb) bt
#0 0x00007f442c840e33 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1 0x00000000006d1dde in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7fffdedd75a0, cur_timeout=-1, set=0xe3eedb8) at latch.c:981
#2 WaitEventSetWait (set=set@entry=0xe3eedb8, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7fffdedd75a0, nevents=nevents@entry=1) at latch.c:935
#3 0x00000000006d2226 in WaitLatchOrSocket (latch=0x7f442c0d9644, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, timeout@entry=0) at latch.c:347
#4 0x00000000006d22ed in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0) at latch.c:302
#5 0x00000000005ef4e3 in gather_readnext (gatherstate=0xe3d7ce8) at nodeGather.c:384
#6 gather_getnext (gatherstate=0xe3d7ce8) at nodeGather.c:283
#7 ExecGather (node=node@entry=0xe3d7ce8) at nodeGather.c:229
#8 0x00000000005dd728 in ExecProcNode (node=node@entry=0xe3d7ce8) at execProcnode.c:515
#9 0x00000000005f995c in ExecNestLoop (node=node@entry=0x10ef9d90) at nodeNestloop.c:174
#10 0x00000000005dd7b8 in ExecProcNode (node=node@entry=0x10ef9d90) at execProcnode.c:476
#11 0x00000000005f4990 in ExecLimit (node=node@entry=0x10ef9ac0) at nodeLimit.c:91
#12 0x00000000005dd6b8 in ExecProcNode (node=node@entry=0x10ef9ac0) at execProcnode.c:531
#13 0x00000000005d993f in ExecutePlan (dest=0x5babba0, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x10ef9ac0, estate=0x368b6e8) at execMain.c:1567
#14 standard_ExecutorRun (queryDesc=0x52a89f8, direction=<optimized out>, count=0) at execMain.c:338
#15 0x00000000006f7118 in PortalRunSelect (portal=portal@entry=0x3e84a10, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x5babba0) at pquery.c:946
#16 0x00000000006f863e in PortalRun (portal=0x3e84a10, count=9223372036854775807, isTopLevel=<optimized out>, dest=0x5babba0, altdest=0x5babba0, completionTag=0x7fffdedd7b30 "") at pquery.c:787
#17 0x00000000006f5ef3 in exec_simple_query (query_string=<optimized out>) at postgres.c:1094
#18 PostgresMain (argc=65554960, argv=0xa38beb0, dbname=0x14a2578 "regression", username=0xa38c088 "\260\276\070\n") at postgres.c:4059
#19 0x000000000046c82a in BackendRun (port=0x14c75b0) at postmaster.c:4258
#20 BackendStartup (port=0x14c75b0) at postmaster.c:3932
#21 ServerLoop () at postmaster.c:1690
#22 0x0000000000690aae in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x14a1560) at postmaster.c:1298
#23 0x000000000046d78d in main (argc=4, argv=0x14a1560) at main.c:228

(gdb) p debug_query_string
select ref_68.is_trigger_deletable as c0, (select d from
public.renamecolumnchild limit 1 offset 15) as c1,
ref_68.is_trigger_insertable_into as c2, 3 as c3 from
information_schema.role_udt_grants as ref_67 left join
information_schema.views as ref_68 on (ref_67.grantee =
ref_68.table_catalog ) where true limit 138;

Attachments:

plan.txt.bz2application/octet-streamDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Seltenreich (#1)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

Andreas Seltenreich <seltenreich@gmx.de> writes:

when fuzz testing master as of c1543a8, parallel workers trigger the
following assertion in ExecInitSubPlan every couple hours.
TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File: "list.c", Line: 390)
Sample backtraces of a worker and leader below, plan of leader attached.
The collected queries don't seem to reproduce it.

Odd. My understanding of the restrictions on parallel query is that
anything involving a SubPlan ought not be parallelized; and this query
doesn't seem to get parallelized when I try it. It seems like what you're
seeing is that sometimes that restriction fails to be checked, but how
could that happen?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#2)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Fri, May 6, 2016 at 8:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andreas Seltenreich <seltenreich@gmx.de> writes:

when fuzz testing master as of c1543a8, parallel workers trigger the
following assertion in ExecInitSubPlan every couple hours.
TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File:

"list.c", Line: 390)

Sample backtraces of a worker and leader below, plan of leader attached.
The collected queries don't seem to reproduce it.

Odd. My understanding of the restrictions on parallel query is that
anything involving a SubPlan ought not be parallelized;

Subplan references are considered parallel-restricted, so parallel plan can
be generated if there are subplans in a query, but they shouldn't be pushed
to workers. I have tried a somewhat simpler example to see if we pushdown
anything parallel restricted to worker in case of joins and it turned out
there are cases when that can happen. Consider below example:

create or replace function parallel_func_select() returns integer
as $$
declare
ret_val int;
begin
ret_val := 1000;
return ret_val;
end;
$$ language plpgsql Parallel Restricted;

CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 10000000) g;

CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 1000000) g;

Explain Verbose SELECT t1.c1 + parallel_func_select(), t2.c1 FROM t1 JOIN
t2 ON t1.c1 = t2.c1;

QUERY PLAN

--------------------------------------------------------------------------------
--------
Gather (cost=32813.00..537284.53 rows=1000000 width=8)
Output: ((t1.c1 + parallel_func_select())), t2.c1
Workers Planned: 2
-> Hash Join (cost=31813.00..436284.53 rows=1000000 width=8)
Output: (t1.c1 + parallel_func_select()), t2.c1
Hash Cond: (t1.c1 = t2.c1)
-> Parallel Seq Scan on public.t1 (cost=0.00..95721.08
rows=4166608 w
idth=4)
Output: t1.c1, t1.c2
-> Hash (cost=15406.00..15406.00 rows=1000000 width=4)
Output: t2.c1
-> Seq Scan on public.t2 (cost=0.00..15406.00 rows=1000000
widt
h=4)
Output: t2.c1
(12 rows)

From the above output it is clear that parallel restricted function is
pushed down below gather node. I found that though we have have care fully
avoided to push pathtarget below GatherPath in apply_projection_to_path()
if pathtarget contains any parallel unsafe or parallel restricted clause,
but we are separately also trying to apply pathtarget to partialpath list
which doesn't seem to be the correct way even if it is required. I think
this has been added during parallel aggregate patch and it seems to me this
is not required after the changes related to GatherPath in
apply_projection_to_path().

After applying the attached patch, it avoids to add parallel restricted
clauses below gather path.

Now back to the original bug, if you notice in plan file attached in
original bug report, the subplan is pushed below Gather node in target
list, but not to immediate join, rather at one more level down to SeqScan
path. I am still not sure how it has manage to push the restricted clauses
to that down the level.

Thoughts?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

avoid_restricted_clause_below_gather_v1.patchapplication/octet-stream; name=avoid_restricted_clause_below_gather_v1.patchDownload+0-13
#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#3)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Sat, May 7, 2016 at 6:37 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, May 6, 2016 at 8:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andreas Seltenreich <seltenreich@gmx.de> writes:

when fuzz testing master as of c1543a8, parallel workers trigger the
following assertion in ExecInitSubPlan every couple hours.
TRAP: FailedAssertion("!(list != ((List *) ((void *)0)))", File:

"list.c", Line: 390)

Sample backtraces of a worker and leader below, plan of leader

attached.

The collected queries don't seem to reproduce it.

Odd. My understanding of the restrictions on parallel query is that
anything involving a SubPlan ought not be parallelized;

Subplan references are considered parallel-restricted, so parallel plan

can be generated if there are subplans in a query, but they shouldn't be
pushed to workers. I have tried a somewhat simpler example to see if we
pushdown anything parallel restricted to worker in case of joins and it
turned out there are cases when that can happen. Consider below example:

From the above output it is clear that parallel restricted function is

pushed down below gather node. I found that though we have have care fully
avoided to push pathtarget below GatherPath in apply_projection_to_path()
if pathtarget contains any parallel unsafe or parallel restricted clause,
but we are separately also trying to apply pathtarget to partialpath list
which doesn't seem to be the correct way even if it is required. I think
this has been added during parallel aggregate patch and it seems to me this
is not required after the changes related to GatherPath in
apply_projection_to_path().

After applying the attached patch, it avoids to add parallel restricted

clauses below gather path.

Now back to the original bug, if you notice in plan file attached in

original bug report, the subplan is pushed below Gather node in target
list, but not to immediate join, rather at one more level down to SeqScan
path. I am still not sure how it has manage to push the restricted clauses
to that down the level.

On further analysis, I think I know what is going on in the original bug
report. We add the Vars (build_base_rel_tlists) and PlaceholderVars
(add_placeholders_to_base_rels()) to each relations (RelOptInfo) target
during qurey_planner and the Subplans are added as PlaceHolderVars in
target expressions. Now while considering whether a particular rel can be
parallel in set_rel_consider_parallel(), we don't check the target
expressions to allow the relation for parallelism. I think we can prohibit
the relation to be considered for parallelism if it's target expressions
contain any parallel restricted clause. Fix on those lines is attached
with this mail.

Thanks to Dilip Kumar for helping me in narrowing down this particular
problem. We were not able to generate the exact test, but I think the
above theory is sufficient to prove that it can cause a problem as seen in
the original bug report.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

prohibit_parallel_clause_below_rel_v1.patchapplication/octet-stream; name=prohibit_parallel_clause_below_rel_v1.patchDownload+8-7
#5Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#4)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Thu, May 12, 2016 at 7:46 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On further analysis, I think I know what is going on in the original bug
report. We add the Vars (build_base_rel_tlists) and PlaceholderVars
(add_placeholders_to_base_rels()) to each relations (RelOptInfo) target
during qurey_planner and the Subplans are added as PlaceHolderVars in target
expressions. Now while considering whether a particular rel can be parallel
in set_rel_consider_parallel(), we don't check the target expressions to
allow the relation for parallelism. I think we can prohibit the relation to
be considered for parallelism if it's target expressions contain any
parallel restricted clause. Fix on those lines is attached with this mail.

Thanks to Dilip Kumar for helping me in narrowing down this particular
problem. We were not able to generate the exact test, but I think the above
theory is sufficient to prove that it can cause a problem as seen in the
original bug report.

I could be wrong, but I thought that the target list for an expression
would always contain only Vars at this stage. Non-default tlists get
injected at the end of scan/join planning. Am I wrong?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#5)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

Robert Haas <robertmhaas@gmail.com> writes:

I could be wrong, but I thought that the target list for an expression
would always contain only Vars at this stage. Non-default tlists get
injected at the end of scan/join planning. Am I wrong?

Target list for a relation, you mean? See relation.h:

* reltarget - Default Path output tlist for this rel; normally contains
* Var and PlaceHolderVar nodes for the values we need to
* output from this relation.
* List is in no particular order, but all rels of an
* appendrel set must use corresponding orders.
* NOTE: in an appendrel child relation, may contain
* arbitrary expressions pulled up from a subquery!

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Thu, May 12, 2016 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I could be wrong, but I thought that the target list for an expression
would always contain only Vars at this stage. Non-default tlists get
injected at the end of scan/join planning. Am I wrong?

Target list for a relation, you mean? See relation.h:

* reltarget - Default Path output tlist for this rel; normally contains
* Var and PlaceHolderVar nodes for the values we need to
* output from this relation.
* List is in no particular order, but all rels of an
* appendrel set must use corresponding orders.
* NOTE: in an appendrel child relation, may contain
* arbitrary expressions pulled up from a subquery!

Err, wow. That makes my head hurt. Can you explain why this case
only arises for appendrel children, and not for plain rels?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

Robert Haas <robertmhaas@gmail.com> writes:

Target list for a relation, you mean? See relation.h:

* reltarget - Default Path output tlist for this rel; normally contains
* Var and PlaceHolderVar nodes for the values we need to
* output from this relation.
* List is in no particular order, but all rels of an
* appendrel set must use corresponding orders.
* NOTE: in an appendrel child relation, may contain
* arbitrary expressions pulled up from a subquery!

Err, wow. That makes my head hurt. Can you explain why this case
only arises for appendrel children, and not for plain rels?

Well, plain rels only output Vars ;-)

But consider an appendrel representing

(SELECT x+1 FROM t1 UNION ALL SELECT y+2 FROM t2) ss(a)

The RTE for ss will have a reltarget list containing just "a".
Once we pull up the subqueries, the reltarget lists for the two child
appendrel members will need to contain "x+1" and "y+2" in order to be
equivalent to the parent's reltarget list. See set_append_rel_size(),
which does that transformation.

This doesn't happen with ordinary subquery flattening because there
isn't a RelOptInfo corresponding to an ordinary subquery that's been
pulled up into the parent query.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#8)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Thu, May 12, 2016 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Target list for a relation, you mean? See relation.h:

* reltarget - Default Path output tlist for this rel; normally

contains

* Var and PlaceHolderVar nodes for the values we need

to

* output from this relation.
* List is in no particular order, but all rels of an
* appendrel set must use corresponding orders.
* NOTE: in an appendrel child relation, may contain
* arbitrary expressions pulled up from a subquery!

Err, wow. That makes my head hurt. Can you explain why this case
only arises for appendrel children, and not for plain rels?

Well, plain rels only output Vars ;-)

Does this mean that base rels can't contain PlaceHolderVars? Isn't it
possible in below code:

query_planner()
{
..
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluatable at a base rel.
*/
add_placeholders_to_base_rels(root);
..
}

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#10Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#9)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Fri, May 13, 2016 at 9:43 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, May 12, 2016 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Target list for a relation, you mean? See relation.h:

* reltarget - Default Path output tlist for this rel; normally

contains

* Var and PlaceHolderVar nodes for the values we

need to

* output from this relation.
* List is in no particular order, but all rels of an
* appendrel set must use corresponding orders.
* NOTE: in an appendrel child relation, may contain
* arbitrary expressions pulled up from a subquery!

Err, wow. That makes my head hurt. Can you explain why this case
only arises for appendrel children, and not for plain rels?

Well, plain rels only output Vars ;-)

Does this mean that base rels can't contain PlaceHolderVars? Isn't it

possible in below code:

Here I want to ask base rels which are plain rels?

It might be that I am missing something, but if we debug the serial plan
for original query [1]select ref_68.is_trigger_deletable as c0, (select d from public.renamecolumnchild limit 1 offset 15) as c1, ref_68.is_trigger_insertable_into as c2, 3 as c3 from information_schema.role_udt_grants as ref_67 left join information_schema.views as ref_68 on (ref_67.grantee = ref_68.table_catalog ) where true limit 138; for which this issue is reported, we have noticed
that PlaceHolderVars that contain subplans are added to base rels for which
RTE kind is (RTE_RELATION).

[1]: select ref_68.is_trigger_deletable as c0, (select d from public.renamecolumnchild limit 1 offset 15) as c1, ref_68.is_trigger_insertable_into as c2, 3 as c3 from information_schema.role_udt_grants as ref_67 left join information_schema.views as ref_68 on (ref_67.grantee = ref_68.table_catalog ) where true limit 138;
public.renamecolumnchild limit 1 offset 15) as c1,
ref_68.is_trigger_insertable_into as c2, 3 as c3 from
information_schema.role_udt_grants as ref_67 left join
information_schema.views as ref_68 on (ref_67.grantee =
ref_68.table_catalog ) where true limit 138;

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#11Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#10)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Fri, May 13, 2016 at 10:31 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

Here I want to ask base rels which are plain rels?

It might be that I am missing something, but if we debug the serial plan
for original query [1] for which this issue is reported, we have noticed
that PlaceHolderVars that contain subplans are added to base rels for which
RTE kind is (RTE_RELATION).

[1] - select ref_68.is_trigger_deletable as c0, (select d from

Finally, I could reproduce this issue, with just three tables.

If targetlist of top query has subquery references and if subquery is part
of OUTER join, then while pulling up the subquery,
it will create PlaceHolderVar, which will have expressions.

After applying prohibit_parallel_clause_below_rel_v1.patch, it is not
selecting parallel plan.
So now its clear that because of sub query pullup, we may get expression in
targetlist while creating single table path list. So we need to avoid
parallel plan if it contains expression.

Below is my test:
--------------------------

postgres=# CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM
postgres-# generate_series(1, 10000000) g;

SELECT 10000000
postgres=#
postgres=# CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM
postgres-# generate_series(1, 1000000) g;
SELECT 1000000
postgres=# CREATE TABLE t3(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 1000000) g;
SELECT 1000000
postgres=# analyze t1;
ANALYZE
postgres=# analyze t2;
ANALYZE
postgres=# analyze t3;
ANALYZE

postgres=# explain select t1.c1, y.x from t1 left join (select (select
t3.c1 from t3 where t3.c1=t2.c1) as x from t2)y on (y.x=t1.c1);
QUERY PLAN

-------------------------------------------------------------------------------------
Hash Right Join (cost=319113.85..96991333895.85 rows=9999860 width=8)
Hash Cond: ((SubPlan 2) = t1.c1)
-> Gather (cost=1000.00..7460943906.00 rows=1000000 width=8)
Workers Planned: 2
-> Parallel Seq Scan on t2 (cost=0.00..7460842906.00 rows=416667
width=8)
SubPlan 1
-> Seq Scan on t3 (cost=0.00..17906.00 rows=1 width=4)
Filter: (c1 = t2.c1)
-> Hash (cost=154053.60..154053.60 rows=9999860 width=4)
-> Seq Scan on t1 (cost=0.00..154053.60 rows=9999860 width=4)
SubPlan 2
-> Seq Scan on t3 t3_1 (cost=0.00..17906.00 rows=1 width=4)
Filter: (c1 = t2.c1)
(13 rows)

postgres=# select t1.c1, y.x from t1 left join (select (select t3.c1 from
t3 where t3.c1=t2.c1) as x from t2)y on (y.x=t1.c1);
LOG: worker process: parallel worker for PID 109446 (PID 109483) was
terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#12Andreas Seltenreich
seltenreich@gmx.de
In reply to: Dilip Kumar (#11)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

Amit Kapila writes:

avoid_restricted_clause_below_gather_v1.patch
prohibit_parallel_clause_below_rel_v1.patch

I didn't observe any parallel worker related coredumps since applying
these. The same amount of testing done before applying them yielded
about a dozend.

Dilip Kumar writes:

So now its clear that because of sub query pullup, we may get expression in
targetlist while creating single table path list. So we need to avoid
parallel plan if it contains expression.

This sounds like a rather heavy restriction though…

regards,
Andreas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Andreas Seltenreich (#12)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Sun, May 22, 2016 at 9:32 PM, Andreas Seltenreich <seltenreich@gmx.de>
wrote:

Amit Kapila writes:

avoid_restricted_clause_below_gather_v1.patch
prohibit_parallel_clause_below_rel_v1.patch

I didn't observe any parallel worker related coredumps since applying
these. The same amount of testing done before applying them yielded
about a dozend.

Thanks for verification.

Dilip Kumar writes:

So now its clear that because of sub query pullup, we may get

expression in

targetlist while creating single table path list. So we need to avoid
parallel plan if it contains expression.

This sounds like a rather heavy restriction though…

I think what Dilip means by above statement is to avoid parallel plan if
target list contains parallel unsafe or restricted expressions. We already
restrict generation of parallel plans if qualification for a relation
contains such expressions (refer set_rel_consider_parallel()), so this
doesn't sound to be heavy restriction.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#8)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Thu, May 12, 2016 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Target list for a relation, you mean? See relation.h:

* reltarget - Default Path output tlist for this rel; normally

contains

* Var and PlaceHolderVar nodes for the values we need

to

* output from this relation.
* List is in no particular order, but all rels of an
* appendrel set must use corresponding orders.
* NOTE: in an appendrel child relation, may contain
* arbitrary expressions pulled up from a subquery!

Err, wow. That makes my head hurt. Can you explain why this case
only arises for appendrel children, and not for plain rels?

Well, plain rels only output Vars ;-)

But consider an appendrel representing

(SELECT x+1 FROM t1 UNION ALL SELECT y+2 FROM t2) ss(a)

The RTE for ss will have a reltarget list containing just "a".
Once we pull up the subqueries, the reltarget lists for the two child
appendrel members will need to contain "x+1" and "y+2" in order to be
equivalent to the parent's reltarget list. See set_append_rel_size(),
which does that transformation.

I think this can also lead to an issue, as currently for child relations,
we just copy consider_parallel flag of parent. Now, if the child rel
target list is adjusted such that it contains parallel restricted
expression, then we are bound to fail. I think to handle append rel case
appropriately, we should compute the consider_parallel flag for each child
relation in set_append_rel_size() and then later when computing the flag
for parent rel, consider the flag for child rels (basically if any child
rel has consider_parallel as false, then set consider_parallel for parent
as false). To achieve this, we need to call set_rel_consider_parallel()
after calling set_rel_size().

Thoughts?

Just to summarize, apart from above issue, we have discussed two different
issues related to parallel query in this thread.
a. Push down of parallel restricted clauses to nodes below gather. Patch
to fix same is posted upthread [1]/messages/by-id/CAA4eK1Ky2=HsTsT4hmfL=EAL5rv0_t59tvWzVK9HQKvN6Dovkw@mail.gmail.com.
b. Wrong assumption that target list can only contain Vars. Patch to fix
same is posted upthread [2]/messages/by-id/CAA4eK1L-Uo=s4=0jvvVA51pj06u5WdDvSQg673yuxJ_Ja+x86Q@mail.gmail.com. Test which prove our assumption is wrong is
also posted upthread [3]/messages/by-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=zWqA@mail.gmail.com.

I will add this issue in list of open issues for 9.6 @
https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items

[1]: /messages/by-id/CAA4eK1Ky2=HsTsT4hmfL=EAL5rv0_t59tvWzVK9HQKvN6Dovkw@mail.gmail.com
/messages/by-id/CAA4eK1Ky2=HsTsT4hmfL=EAL5rv0_t59tvWzVK9HQKvN6Dovkw@mail.gmail.com
[2]: /messages/by-id/CAA4eK1L-Uo=s4=0jvvVA51pj06u5WdDvSQg673yuxJ_Ja+x86Q@mail.gmail.com
/messages/by-id/CAA4eK1L-Uo=s4=0jvvVA51pj06u5WdDvSQg673yuxJ_Ja+x86Q@mail.gmail.com
[3]: /messages/by-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=zWqA@mail.gmail.com
/messages/by-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=zWqA@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#15Noah Misch
noah@leadboat.com
In reply to: Amit Kapila (#14)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Thu, May 26, 2016 at 05:08:31PM +0530, Amit Kapila wrote:

Just to summarize, apart from above issue, we have discussed two different
issues related to parallel query in this thread.
a. Push down of parallel restricted clauses to nodes below gather. Patch
to fix same is posted upthread [1].
b. Wrong assumption that target list can only contain Vars. Patch to fix
same is posted upthread [2]. Test which prove our assumption is wrong is
also posted upthread [3].

I will add this issue in list of open issues for 9.6 @
https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items

[1] -
/messages/by-id/CAA4eK1Ky2=HsTsT4hmfL=EAL5rv0_t59tvWzVK9HQKvN6Dovkw@mail.gmail.com
[2] -
/messages/by-id/CAA4eK1L-Uo=s4=0jvvVA51pj06u5WdDvSQg673yuxJ_Ja+x86Q@mail.gmail.com
[3] -
/messages/by-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=zWqA@mail.gmail.com

[Action required within 72 hours. This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20160527025039.GA447393@tornado.leadboat.com and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.

[1]: /messages/by-id/20160527025039.GA447393@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#3)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Sat, May 7, 2016 at 9:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

From the above output it is clear that parallel restricted function is
pushed down below gather node. I found that though we have have care fully
avoided to push pathtarget below GatherPath in apply_projection_to_path() if
pathtarget contains any parallel unsafe or parallel restricted clause, but
we are separately also trying to apply pathtarget to partialpath list which
doesn't seem to be the correct way even if it is required. I think this has
been added during parallel aggregate patch and it seems to me this is not
required after the changes related to GatherPath in
apply_projection_to_path().

Good catch. Committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Thu, May 12, 2016 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Err, wow. That makes my head hurt. Can you explain why this case
only arises for appendrel children, and not for plain rels?

Well, plain rels only output Vars ;-)

Hmm. Dilip's example in
/messages/by-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=zWqA@mail.gmail.com
seems to show that it's possible to end up with a targetlist
containing non-Vars even for a baserel. In that example,
pull_up_subqueries() injects t2 into the parent query with a target
list containing two items, one of which is a PlaceHolderVar
referencing the subplan for t3.

Unfortunately, that makes it a bit hard to avoid wasting cycles here.
Amit's patch in
/messages/by-id/CAA4eK1L-Uo=s4=0jvvVA51pj06u5WdDvSQg673yuxJ_Ja+x86Q@mail.gmail.com
is a bit brute force: it just checks the target list for every
relation for parallel-restricted constructs. In most cases, no
subqueries having been pulled up, it will find none, but it'll still
have to walk the whole target list to figure that out. If we had some
way of knowing that nothing was pulled up into the current rel, we
could avoid that. This seems to apply equally to baserels and
appendrels; both will commonly have a target list containing only
Vars, but both can end up with non-Vars in the target list after
subqueries are pulled up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#15)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Wed, Jun 1, 2016 at 9:35 PM, Noah Misch <noah@leadboat.com> wrote:

[Action required within 72 hours. This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.

Status update:

I'm working on it. I think there are several issues here, of which
I've committed a fix for just one. I hope to have them all fixed by
next Friday. If not, I'll send another update then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#17)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Sat, Jun 4, 2016 at 1:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 12, 2016 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Err, wow. That makes my head hurt. Can you explain why this case
only arises for appendrel children, and not for plain rels?

Well, plain rels only output Vars ;-)

Hmm. Dilip's example in

/messages/by-id/CAFiTN-vzg5BkK6kAh3OMhvgRu-uJvkjz47ybtopMAfGJp=zWqA@mail.gmail.com

seems to show that it's possible to end up with a targetlist
containing non-Vars even for a baserel. In that example,
pull_up_subqueries() injects t2 into the parent query with a target
list containing two items, one of which is a PlaceHolderVar
referencing the subplan for t3.

Unfortunately, that makes it a bit hard to avoid wasting cycles here.
Amit's patch in

/messages/by-id/CAA4eK1L-Uo=s4=0jvvVA51pj06u5WdDvSQg673yuxJ_Ja+x86Q@mail.gmail.com

is a bit brute force: it just checks the target list for every
relation for parallel-restricted constructs. In most cases, no
subqueries having been pulled up, it will find none, but it'll still
have to walk the whole target list to figure that out. If we had some
way of knowing that nothing was pulled up into the current rel, we
could avoid that. This seems to apply equally to baserels and
appendrels;

That seems doable, as for such rels we can only have Vars and
PlaceHolderVars in targetlist. Basically, whenever we are adding
PlaceHolderVars to a relation, just remember that information and use it
later. The patch doing the same is attached with this mail. Now still,
this won't cover the case of ChildRels for an Append Relation as for that
we adjust target list separately in set_append_rel_size. I think we need
to deal with it separately.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

prohibit_parallel_clause_below_rel_v2.patchapplication/octet-stream; name=prohibit_parallel_clause_below_rel_v2.patchDownload+21-10
#20Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#19)
Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)

On Mon, Jun 6, 2016 at 6:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

That seems doable, as for such rels we can only have Vars and
PlaceHolderVars in targetlist. Basically, whenever we are adding
PlaceHolderVars to a relation, just remember that information and use it
later. The patch doing the same is attached with this mail. Now still,
this won't cover the case of ChildRels for an Append Relation as for that we
adjust target list separately in set_append_rel_size. I think we need to
deal with it separately.

This approach looks pretty good to me. Here's a revised version of
your patch, with some renaming and other adjustments. I'm not sure
exactly what you're referring to in set_append_rel_size, but I did add
a line of code there that might be relevant.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

check-tlists-rmh.patchbinary/octet-stream; name=check-tlists-rmh.patchDownload+22-3
#21Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#20)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#21)
#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#22)
#24Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#23)