Assert failure in CTE inlining with view and correlated subquery

Started by Tomas Vondraover 3 years ago8 messages
#1Tomas Vondra
tomas.vondra@enterprisedb.com
2 attachment(s)

Hi,

it seems there's something wrong with CTE inlining when there's a view
containing a correlated subquery referencing the CTE. Consider a simple
example like this:

create table results (
id serial primary key,
run text,
tps float4
);

create view results_agg as
with base_tps as (
select run, tps from results
)
select
run,
count(*) as runs,

(select tps from base_tps b where b.run = r.run) AS base_tps

from results r
group by
run
order by
run;

explain SELECT run FROM results_agg ORDER BY 1;

This crashes on this assert in inline_cte():

Assert(context.refcount == 0);

because the refcount value remains 1. There's a backtrace attached.

I don't know why exactly this happens, my knowledge of CTE inlining is
somewhat limited. The counter is clearly out of sync

but a couple more observations:

1) it fails all the way back to PG12, where CTE inlining was added

2) it does not happen if the CTE is defined as MATERIALIZED

QUERY PLAN
-----------------------------------------
Subquery Scan on results_agg
-> Sort
Sort Key: r.run
CTE base_tps
-> Seq Scan on results
-> HashAggregate
Group Key: r.run
-> Seq Scan on results r
(8 rows)

3) without asserts, it seems to work and the query generates this plan

QUERY PLAN
-----------------------------------------
Subquery Scan on results_agg
-> Sort
Sort Key: r.run
-> HashAggregate
Group Key: r.run
-> Seq Scan on results r
(6 rows)

4) it does not seem to happen without the view, i.e. this works

explain
with base_tps as (
select run, tps from results
)
select run from (
select
run,
count(*) as runs,

(select tps from base_tps b where b.run = r.run) AS base_tps

from results r
group by
run
order by
run
) results_agg order by 1;

The difference between plans in (2) and (3) is interesting, because it
seems the CTE got inlined, so why was the refcount not decremented?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

test.sqlapplication/sql; name=test.sqlDownload
backtrace.txttext/plain; charset=UTF-8; name=backtrace.txtDownload
#2Richard Guo
guofenglinux@gmail.com
In reply to: Tomas Vondra (#1)
Re: Assert failure in CTE inlining with view and correlated subquery

On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

The difference between plans in (2) and (3) is interesting, because it
seems the CTE got inlined, so why was the refcount not decremented?

The query is not actually referencing the cte. So the cte range table
entry would not appear anywhere in the query tree. That's why refcount
is not decremented after inline cte walker.

If we explicitly reference the cte in the query, say in the targetlist,
it would then work.

# explain (costs off) SELECT * FROM results_agg ORDER BY 1;
QUERY PLAN
---------------------------------------
Sort
Sort Key: r.run
-> HashAggregate
Group Key: r.run
-> Seq Scan on results r
SubPlan 1
-> Seq Scan on results
Filter: (run = r.run)
(8 rows)

IMO the culprit is that we incorrectly set cterefcount to one while
actually the cte is not referenced at all.

Thanks
Richard

#3Richard Guo
guofenglinux@gmail.com
In reply to: Tomas Vondra (#1)
Re: Assert failure in CTE inlining with view and correlated subquery

On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

it seems there's something wrong with CTE inlining when there's a view
containing a correlated subquery referencing the CTE.

BTW, seems view is not a necessary condition to reproduce this issue.
For instance:

create table t (a int, b int);

explain (costs off) select a from
(
with t_cte as (select a, b from t)
select
a,
(select b from t_cte where t_cte.a = t.a) AS t_sub
from t
) sub;

Thanks
Richard

#4Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#3)
Re: Assert failure in CTE inlining with view and correlated subquery

On Thu, Apr 21, 2022 at 3:51 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra <
tomas.vondra@enterprisedb.com> wrote:

it seems there's something wrong with CTE inlining when there's a view
containing a correlated subquery referencing the CTE.

BTW, seems view is not a necessary condition to reproduce this issue.
For instance:

create table t (a int, b int);

explain (costs off) select a from
(
with t_cte as (select a, b from t)
select
a,
(select b from t_cte where t_cte.a = t.a) AS t_sub
from t
) sub;

Further debugging shows that in this repro the reference to the CTE is
removed when generating paths for the subquery 'sub', where we would try
to remove subquery targetlist items that are not needed. So for the
items we are to remove, maybe we need to check if they contain CTEs and
if so decrease cterefcount of the CTEs correspondingly.

Thanks
Richard

#5Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Richard Guo (#4)
Re: Assert failure in CTE inlining with view and correlated subquery

On 4/21/22 10:29, Richard Guo wrote:

On Thu, Apr 21, 2022 at 3:51 PM Richard Guo <guofenglinux@gmail.com
<mailto:guofenglinux@gmail.com>> wrote:

On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra
<tomas.vondra@enterprisedb.com
<mailto:tomas.vondra@enterprisedb.com>> wrote:

it seems there's something wrong with CTE inlining when there's
a view
containing a correlated subquery referencing the CTE. 

BTW, seems view is not a necessary condition to reproduce this issue.
For instance:

create table t (a int, b int);

explain (costs off) select a from
(
    with t_cte as (select a, b from t)
    select
        a,
        (select b from t_cte where t_cte.a = t.a) AS t_sub
    from t
) sub;

Further debugging shows that in this repro the reference to the CTE is
removed when generating paths for the subquery 'sub', where we would try
to remove subquery targetlist items that are not needed. So for the
items we are to remove, maybe we need to check if they contain CTEs and
if so decrease cterefcount of the CTEs correspondingly.

Right, at some point we remove the unnecessary targetlist entries, but
that ignores the entry may reference a CTE. That's pretty much what I
meant by the counter being "out of sync".

Updating the counter while removing the entry is one option, but maybe
we could simply delay counting the CTE references until after that?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#5)
1 attachment(s)
Re: Assert failure in CTE inlining with view and correlated subquery

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

On 4/21/22 10:29, Richard Guo wrote:

Further debugging shows that in this repro the reference to the CTE is
removed when generating paths for the subquery 'sub', where we would try
to remove subquery targetlist items that are not needed. So for the
items we are to remove, maybe we need to check if they contain CTEs and
if so decrease cterefcount of the CTEs correspondingly.

Right, at some point we remove the unnecessary targetlist entries, but
that ignores the entry may reference a CTE. That's pretty much what I
meant by the counter being "out of sync".
Updating the counter while removing the entry is one option, but maybe
we could simply delay counting the CTE references until after that?

I think we should just drop this cross-check altogether; it is not nearly
useful enough to justify the work that'd be involved in maintaining
cterefcount accurately for all such transformations. All it's really
there for is to be sure that we don't need to make a subplan for the
inlined CTE.

There are two downstream consumers of cte_plan_ids, which currently just
have Asserts that we made a subplan. I think it'd be worth converting
those to real run-time tests, which leads me to something more or less as
attached.

regards, tom lane

Attachments:

remove-bogus-assertion-about-cte-refcount.patchtext/x-diff; charset=us-ascii; name=remove-bogus-assertion-about-cte-refcount.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 998212dda8..d84f66a81b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2804,7 +2804,8 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	if (ndx >= list_length(cteroot->cte_plan_ids))
 		elog(ERROR, "could not find plan for CTE \"%s\"", rte->ctename);
 	plan_id = list_nth_int(cteroot->cte_plan_ids, ndx);
-	Assert(plan_id > 0);
+	if (plan_id <= 0)
+		elog(ERROR, "no plan was made for CTE \"%s\"", rte->ctename);
 	cteplan = (Plan *) list_nth(root->glob->subplans, plan_id - 1);
 
 	/* Mark rel with estimated output rows, width, etc */
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 95476ada0b..7905bc4654 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3898,7 +3898,8 @@ create_ctescan_plan(PlannerInfo *root, Path *best_path,
 	if (ndx >= list_length(cteroot->cte_plan_ids))
 		elog(ERROR, "could not find plan for CTE \"%s\"", rte->ctename);
 	plan_id = list_nth_int(cteroot->cte_plan_ids, ndx);
-	Assert(plan_id > 0);
+	if (plan_id <= 0)
+		elog(ERROR, "no plan was made for CTE \"%s\"", rte->ctename);
 	foreach(lc, cteroot->init_plans)
 	{
 		ctesplan = (SubPlan *) lfirst(lc);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 863e0e24a1..df4ca12919 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -61,7 +61,6 @@ typedef struct inline_cte_walker_context
 {
 	const char *ctename;		/* name and relative level of target CTE */
 	int			levelsup;
-	int			refcount;		/* number of remaining references */
 	Query	   *ctequery;		/* query to substitute */
 } inline_cte_walker_context;
 
@@ -1157,13 +1156,9 @@ inline_cte(PlannerInfo *root, CommonTableExpr *cte)
 	context.ctename = cte->ctename;
 	/* Start at levelsup = -1 because we'll immediately increment it */
 	context.levelsup = -1;
-	context.refcount = cte->cterefcount;
 	context.ctequery = castNode(Query, cte->ctequery);
 
 	(void) inline_cte_walker((Node *) root->parse, &context);
-
-	/* Assert we replaced all references */
-	Assert(context.refcount == 0);
 }
 
 static bool
@@ -1226,9 +1221,6 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context)
 			rte->coltypes = NIL;
 			rte->coltypmods = NIL;
 			rte->colcollations = NIL;
-
-			/* Count the number of replacements we've done */
-			context->refcount--;
 		}
 
 		return false;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c5ab53e05c..244d1e1197 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -241,7 +241,8 @@ struct PlannerInfo
 
 	List	   *init_plans;		/* init SubPlans for query */
 
-	List	   *cte_plan_ids;	/* per-CTE-item list of subplan IDs */
+	List	   *cte_plan_ids;	/* per-CTE-item list of subplan IDs (or -1 if
+								 * no subplan was made for that CTE) */
 
 	List	   *multiexpr_params;	/* List of Lists of Params for MULTIEXPR
 									 * subquery outputs */
#7Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tom Lane (#6)
Re: Assert failure in CTE inlining with view and correlated subquery

On 4/21/22 21:03, Tom Lane wrote:

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

On 4/21/22 10:29, Richard Guo wrote:

Further debugging shows that in this repro the reference to the CTE is
removed when generating paths for the subquery 'sub', where we would try
to remove subquery targetlist items that are not needed. So for the
items we are to remove, maybe we need to check if they contain CTEs and
if so decrease cterefcount of the CTEs correspondingly.

Right, at some point we remove the unnecessary targetlist entries, but
that ignores the entry may reference a CTE. That's pretty much what I
meant by the counter being "out of sync".
Updating the counter while removing the entry is one option, but maybe
we could simply delay counting the CTE references until after that?

I think we should just drop this cross-check altogether; it is not nearly
useful enough to justify the work that'd be involved in maintaining
cterefcount accurately for all such transformations. All it's really
there for is to be sure that we don't need to make a subplan for the
inlined CTE.

There are two downstream consumers of cte_plan_ids, which currently just
have Asserts that we made a subplan. I think it'd be worth converting
those to real run-time tests, which leads me to something more or less as
attached.

WFM. I'm not particularly attached to the assert, so if you say it's not
worth it let's get rid of it.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#7)
Re: Assert failure in CTE inlining with view and correlated subquery

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

On 4/21/22 21:03, Tom Lane wrote:

I think we should just drop this cross-check altogether; it is not nearly
useful enough to justify the work that'd be involved in maintaining
cterefcount accurately for all such transformations. All it's really
there for is to be sure that we don't need to make a subplan for the
inlined CTE.

WFM. I'm not particularly attached to the assert, so if you say it's not
worth it let's get rid of it.

Done.

regards, tom lane