"variable not found in subplan target list"

Started by Alvaro Herreraalmost 3 years ago12 messages
#1Alvaro Herrera
alvherre@alvh.no-ip.org

I have to run now so can't dissect it, but while running sqlsmith on the
SQL/JSON patch after Justin's report, I got $SUBJECT in this query:

MERGE INTO public.target_parted as target_0
USING (select
subq_0.c5 as c0,
subq_0.c0 as c1,
ref_0.a as c2,
subq_0.c1 as c3,
subq_0.c9 as c4,
(select c from public.prt2_m_p3 limit 1 offset 1)
as c5,
subq_0.c8 as c6,
ref_0.a as c7,
subq_0.c7 as c8,
subq_0.c1 as c9,
pg_catalog.system_user() as c10
from
public.itest1 as ref_0
left join (select
ref_1.matches as c0,
ref_1.typ as c1,
ref_1.colname as c2,
(select slotname from public.iface limit 1 offset 44)
as c3,
ref_1.matches as c4,
ref_1.op as c5,
ref_1.matches as c6,
ref_1.value as c7,
ref_1.op as c8,
ref_1.op as c9,
ref_1.typ as c10
from
public.brinopers_multi as ref_1
where cast(null as polygon) <@ (select polygon from public.tab_core_types limit 1 offset 22)
) as subq_0
on (cast(null as macaddr8) >= cast(null as macaddr8))
where subq_0.c10 > subq_0.c2
limit 49) as subq_1
ON target_0.b = subq_1.c2
WHEN MATCHED
AND (cast(null as box) |>> cast(null as box))
or (cast(null as lseg) ?-| (select s from public.lseg_tbl limit 1 offset 6)
)
THEN DELETE
WHEN NOT MATCHED AND (EXISTS (
select
21 as c0,
subq_2.c0 as c1
from
public.itest14 as sample_0 tablesample system (3.6)
inner join public.num_exp_sqrt as sample_1 tablesample bernoulli (0.3)
on (cast(null as "char") <= cast(null as "char")),
lateral (select
sample_1.id as c0
from
public.a as ref_2
where (cast(null as lseg) <@ cast(null as line))
or ((select b3 from public.bit_defaults limit 1 offset 80)
<> (select b3 from public.bit_defaults limit 1 offset 4)
)
limit 158) as subq_2
where (cast(null as name) !~ (select t from public.test_tsvector limit 1 offset 5)
)
and ((select bool from public.tab_core_types limit 1 offset 61)
< (select pg_catalog.bool_or(v) from public.rtest_view1)
)))
or (18 is NULL)
THEN INSERT VALUES ( pg_catalog.int4um(
cast(public.func_with_bad_set() as int4)), 13)
WHEN MATCHED AND ((24 is not NULL)
or (true))
or (cast(null as "timestamp") <= cast(null as timestamptz))
THEN UPDATE set
b = target_0.b

Ugh.

I got no more SQL/JSON related crashes so far.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: "variable not found in subplan target list"

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

I have to run now so can't dissect it, but while running sqlsmith on the
SQL/JSON patch after Justin's report, I got $SUBJECT in this query:

Reproduces in HEAD and v15 too (once you replace pg_catalog.system_user
with some function that exists in v15). So it's not the fault of the
JSON patch, nor of my outer-join hacking which had been my first thought.

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: "variable not found in subplan target list"

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

I have to run now so can't dissect it, but while running sqlsmith on the
SQL/JSON patch after Justin's report, I got $SUBJECT in this query:

I reduced this down to

MERGE INTO public.target_parted as target_0
USING public.itest1 as ref_0
ON target_0.b = ref_0.a
WHEN NOT MATCHED
THEN INSERT VALUES (42, 13);

The critical moving part seems to just be that the MERGE target
is a partitioned table ... but surely somebody tested that before?

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: "variable not found in subplan target list"

I wrote:

I reduced this down to

MERGE INTO public.target_parted as target_0
USING public.itest1 as ref_0
ON target_0.b = ref_0.a
WHEN NOT MATCHED
THEN INSERT VALUES (42, 13);

The critical moving part seems to just be that the MERGE target
is a partitioned table ... but surely somebody tested that before?

Oh, it's not just any partitioned table:

regression=# \d+ target_parted
Partitioned table "public.target_parted"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
a | integer | | | | plain | | |
b | integer | | | | plain | | |
Partition key: LIST (a)
Number of partitions: 0

The planner is reducing the scan of target_parted to
a dummy scan, as is reasonable, but it forgets to
provide ctid as an output from that scan; then the
parent join node is unhappy because it does have
a ctid output. So it looks like the problem is some
shortcut we take while creating the dummy scan.

I suppose that without the planner bug, this'd fail at
runtime for lack of a partition to put (42,13) into.
Because of that, the case isn't really interesting
for production, which may explain the lack of reports.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)
1 attachment(s)
Re: "variable not found in subplan target list"

I wrote:

The planner is reducing the scan of target_parted to
a dummy scan, as is reasonable, but it forgets to
provide ctid as an output from that scan; then the
parent join node is unhappy because it does have
a ctid output. So it looks like the problem is some
shortcut we take while creating the dummy scan.

Oh, actually the problem is in distribute_row_identity_vars,
which is supposed to handle this case, but it thinks it doesn't
have to back-fill the rel's reltarget. Wrong. Now that I see
the problem, I wonder if we can't reproduce a similar symptom
without MERGE, which would mean that v14 has the issue too.

The attached seems to fix it, but I'm going to look for a
non-MERGE test case before pushing.

regards, tom lane

Attachments:

fix-missing-ctid-variable.patchtext/x-diff; charset=us-ascii; name=fix-missing-ctid-variable.patchDownload
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 9d377385f1..c1b1557570 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -21,6 +21,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
 #include "optimizer/pathnode.h"
+#include "optimizer/planmain.h"
 #include "parser/parsetree.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -994,9 +995,10 @@ distribute_row_identity_vars(PlannerInfo *root)
 	 * certainly process no rows.  Handle this edge case by re-opening the top
 	 * result relation and adding the row identity columns it would have used,
 	 * as preprocess_targetlist() would have done if it weren't marked "inh".
-	 * (This is a bit ugly, but it seems better to confine the ugliness and
-	 * extra cycles to this unusual corner case.)  We needn't worry about
-	 * fixing the rel's reltarget, as that won't affect the finished plan.
+	 * Then re-run build_base_rel_tlists() to ensure that the added columns
+	 * get propagated to the relation's reltarget.  (This is a bit ugly, but
+	 * it seems better to confine the ugliness and extra cycles to this
+	 * unusual corner case.)
 	 */
 	if (root->row_identity_vars == NIL)
 	{
@@ -1006,6 +1008,8 @@ distribute_row_identity_vars(PlannerInfo *root)
 		add_row_identity_columns(root, result_relation,
 								 target_rte, target_relation);
 		table_close(target_relation, NoLock);
+		build_base_rel_tlists(root, root->processed_tlist);
+		/* There are no ROWID_VAR Vars in this case, so we're done. */
 		return;
 	}
 
#6Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#1)
Re: "variable not found in subplan target list"

So I'm back home and found a couple more weird errors in the log:

MERGE INTO public.idxpart2 as target_0
USING (select 1
from
public.xmltest2 as ref_0
inner join public.prt1_l_p1 as sample_0
inner join fkpart4.droppk as ref_1
on (sample_0.a = ref_1.a )
on (true)
limit 50) as subq_0
left join information_schema.transforms as ref_2
left join public.transition_table_status as sample_1
on (ref_2.transform_type is not NULL)
on (true)
ON target_0.a = sample_1.level
WHEN MATCHED
THEN UPDATE set a = target_0.a;
ERROR: mismatching PartitionPruneInfo found at part_prune_index 0
DETALLE: plan node relids (b 1), pruneinfo relids (b 36)

This one is probably my fault, will look later.

select
pg_catalog.pg_stat_get_buf_fsync_backend() as c9
from
public.tenk2 as ref_0
where (ref_0.stringu2 is NULL)
and (EXISTS (
select 1 from fkpart5.fk1 as ref_1
where pg_catalog.current_date() < (select pg_catalog.max(filler3) from public.mcv_lists))) ;

ERROR: subplan "InitPlan 1 (returns $1)" was not initialized
CONTEXTO: parallel worker

select 1 as c0
from
(select
subq_0.c9 as c5,
subq_0.c8 as c9
from
public.iso8859_5_inputs as ref_0,
lateral (select
ref_1.ident as c2,
ref_0.description as c8,
ref_1.used_bytes as c9
from
pg_catalog.pg_backend_memory_contexts as ref_1
where true
) as subq_0
where subq_0.c2 is not NULL) as subq_1
inner join pg_catalog.pg_class as sample_0
on (subq_1.c5 = public.int8alias1in(
cast(case when subq_1.c9 is not NULL then null end
as cstring)))
where true;
ERROR: could not find commutator for operator 53286

There were quite a few of those "variable not found" ones, both
mentioning singular "targetlist" and others that said "targetlists". I
reran them with your patch and they no longer error out, so I guess it's
all the same bug.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"I must say, I am absolutely impressed with what pgsql's implementation of
VALUES allows me to do. It's kind of ridiculous how much "work" goes away in
my code. Too bad I can't do this at work (Oracle 8/9)." (Tom Allison)
http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: "variable not found in subplan target list"

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

So I'm back home and found a couple more weird errors in the log:

ERROR: mismatching PartitionPruneInfo found at part_prune_index 0
DETALLE: plan node relids (b 1), pruneinfo relids (b 36)

This one reproduces for me.

select
pg_catalog.pg_stat_get_buf_fsync_backend() as c9
from
public.tenk2 as ref_0
where (ref_0.stringu2 is NULL)
and (EXISTS (
select 1 from fkpart5.fk1 as ref_1
where pg_catalog.current_date() < (select pg_catalog.max(filler3) from public.mcv_lists))) ;

ERROR: subplan "InitPlan 1 (returns $1)" was not initialized
CONTEXTO: parallel worker

Hmph, I couldn't reproduce that, not even with other settings of
debug_parallel_query. Are you running it with non-default
planner parameters?

select 1 as c0
...
ERROR: could not find commutator for operator 53286

I got a slightly different error:

ERROR: missing support function 1(195306,195306) in opfamily 1976

where

regression=# select 195306::regtype;
regtype
------------
int8alias1
(1 row)

So that one is related to the intentionally-somewhat-broken
int8 opclass configuration that equivclass.sql leaves behind.
I've always had mixed emotions about whether leaving that
set up that way was a good idea or not. In principle nothing
really bad should happen, but it can lead to confusing errors
like this one. Maybe it'd be better to roll that back?

regards, tom lane

#8Amit Langote
amitlangote09@gmail.com
In reply to: Tom Lane (#7)
1 attachment(s)
Re: "variable not found in subplan target list"

On Wed, Mar 29, 2023 at 3:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

So I'm back home and found a couple more weird errors in the log:

ERROR: mismatching PartitionPruneInfo found at part_prune_index 0
DETALLE: plan node relids (b 1), pruneinfo relids (b 36)

This one reproduces for me.

I've looked into this one and the attached patch fixes it for me.
Turns out set_plan_refs()'s idea of when the entries from
PlannerInfo.partPruneInfos are transferred into
PlannerGlobal.partPruneInfo was wrong.

Though, I wonder if we need to keep ec386948948 that introduced the
notion of part_prune_index around if the project that needed it [1]https://commitfest.postgresql.org/42/3478/
has moved on to an entirely different approach altogether, one that
doesn't require hacking up the pruning code.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
[1]: https://commitfest.postgresql.org/42/3478/

Attachments:

fix-wrong-part_prune_index.patchapplication/x-patch; name=fix-wrong-part_prune_index.patchDownload
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5cc8366af6..bd82960169 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -350,6 +350,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
 			palloc0(list_length(glob->subplans) * sizeof(bool));
 	}
 
+	/* Also fix up the information in PartitionPruneInfos. */
+	foreach(lc, root->partPruneInfos)
+	{
+		PartitionPruneInfo *pruneinfo = lfirst(lc);
+		ListCell   *l;
+
+		pruneinfo->root_parent_relids =
+			offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
+		foreach(l, pruneinfo->prune_infos)
+		{
+			List	   *prune_infos = lfirst(l);
+			ListCell   *l2;
+
+			foreach(l2, prune_infos)
+			{
+				PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+				/* RT index of the table to which the pinfo belongs. */
+				pinfo->rtindex += rtoffset;
+			}
+		}
+	}
+
 	/* Now fix the Plan tree */
 	result = set_plan_refs(root, plan, rtoffset);
 
@@ -378,31 +401,6 @@ set_plan_references(PlannerInfo *root, Plan *plan)
 		}
 	}
 
-	/* Also fix up the information in PartitionPruneInfos. */
-	foreach(lc, root->partPruneInfos)
-	{
-		PartitionPruneInfo *pruneinfo = lfirst(lc);
-		ListCell   *l;
-
-		pruneinfo->root_parent_relids =
-			offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
-		foreach(l, pruneinfo->prune_infos)
-		{
-			List	   *prune_infos = lfirst(l);
-			ListCell   *l2;
-
-			foreach(l2, prune_infos)
-			{
-				PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
-				/* RT index of the table to which the pinfo belongs. */
-				pinfo->rtindex += rtoffset;
-			}
-		}
-
-		glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
-	}
-
 	return result;
 }
 
@@ -1718,6 +1716,28 @@ set_customscan_references(PlannerInfo *root,
 	cscan->custom_relids = offset_relid_set(cscan->custom_relids, rtoffset);
 }
 
+/*
+ * fix_part_prune_index
+ *		Adds the PartitionPruneInfo present in root->partPruneIndex at given
+ *		index into the list in PlannerGlobal and returns the index in the new
+ *		list.
+ */
+static int
+fix_part_prune_index(int part_prune_index, PlannerInfo *root)
+{
+	PlannerGlobal  *glob = root->glob;
+	PartitionPruneInfo *pruneinfo;
+
+	Assert(part_prune_index >= 0 &&
+		   part_prune_index < list_length(root->partPruneInfos));
+	pruneinfo = list_nth_node(PartitionPruneInfo, root->partPruneInfos,
+							  part_prune_index);
+
+	glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+
+	return list_length(glob->partPruneInfos) - 1;
+}
+
 /*
  * set_append_references
  *		Do set_plan_references processing on an Append
@@ -1771,11 +1791,11 @@ set_append_references(PlannerInfo *root,
 	aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
 
 	/*
-	 * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
-	 * the index.
+	 * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
 	 */
 	if (aplan->part_prune_index >= 0)
-		aplan->part_prune_index += list_length(root->glob->partPruneInfos);
+		aplan->part_prune_index =
+			fix_part_prune_index(aplan->part_prune_index, root);
 
 	/* We don't need to recurse to lefttree or righttree ... */
 	Assert(aplan->plan.lefttree == NULL);
@@ -1838,11 +1858,11 @@ set_mergeappend_references(PlannerInfo *root,
 	mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
 
 	/*
-	 * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
-	 * the index.
+	 * Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
 	 */
 	if (mplan->part_prune_index >= 0)
-		mplan->part_prune_index += list_length(root->glob->partPruneInfos);
+		mplan->part_prune_index =
+			fix_part_prune_index(mplan->part_prune_index, root);
 
 	/* We don't need to recurse to lefttree or righttree ... */
 	Assert(mplan->plan.lefttree == NULL);
#9Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Langote (#8)
Re: "variable not found in subplan target list"

On 2023-Mar-29, Amit Langote wrote:

On Wed, Mar 29, 2023 at 3:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

So I'm back home and found a couple more weird errors in the log:

ERROR: mismatching PartitionPruneInfo found at part_prune_index 0
DETALLE: plan node relids (b 1), pruneinfo relids (b 36)

This one reproduces for me.

I've looked into this one and the attached patch fixes it for me.
Turns out set_plan_refs()'s idea of when the entries from
PlannerInfo.partPruneInfos are transferred into
PlannerGlobal.partPruneInfo was wrong.

Thanks for the patch. I've pushed it to github for CI testing, and if
there are no problems I'll put it in.

Though, I wonder if we need to keep ec386948948 that introduced the
notion of part_prune_index around if the project that needed it [1]
has moved on to an entirely different approach altogether, one that
doesn't require hacking up the pruning code.

Hmm, that's indeed tempting.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#10Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#9)
Re: "variable not found in subplan target list"

Hi Amit,

On 2023-Mar-30, Alvaro Herrera wrote:

On 2023-Mar-29, Amit Langote wrote:

Though, I wonder if we need to keep ec386948948 that introduced the
notion of part_prune_index around if the project that needed it [1]
has moved on to an entirely different approach altogether, one that
doesn't require hacking up the pruning code.

Hmm, that's indeed tempting.

We have an open item about this, and I see no reason not to do it. I
checked, and putting things back is just a matter of reverting
589bb816499e and ec386948948, cleaning up some trivial pgindent-induced
conflicts, and bumping catversion once more. Would you like to do that
yourself, or do you prefer that I do it? Ideally, we'd do it before
beta1.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#11Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#10)
Re: "variable not found in subplan target list"

On 2023-May-02, Alvaro Herrera wrote:

We have an open item about this, and I see no reason not to do it. I
checked, and putting things back is just a matter of reverting
589bb816499e and ec386948948, cleaning up some trivial pgindent-induced
conflicts, and bumping catversion once more. Would you like to do that
yourself, or do you prefer that I do it? Ideally, we'd do it before
beta1.

I have pushed the revert now.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#12Amit Langote
amitlangote09@gmail.com
In reply to: Alvaro Herrera (#11)
Re: "variable not found in subplan target list"

Hi Alvaro,

On Thu, May 4, 2023 at 19:44 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-May-02, Alvaro Herrera wrote:

We have an open item about this, and I see no reason not to do it. I
checked, and putting things back is just a matter of reverting
589bb816499e and ec386948948, cleaning up some trivial pgindent-induced
conflicts, and bumping catversion once more. Would you like to do that
yourself, or do you prefer that I do it? Ideally, we'd do it before
beta1.

I have pushed the revert now.

Thanks for taking care of it.

(Wouldn’t have been able to get to it till Monday myself.)

--

Thanks, Amit Langote
EDB: http://www.enterprisedb.com