[BUG] Remove self joins causes 'variable not found in subplan target lists' error
Hi, hackers!
We have encountered a bug in the planner which raises an error 'variable
not found in target lists'.
To reproduce apply patch '0001-Disable-JOIN_SEMI-and-JOIN_RIGHT_SEMI-paths.patch'
and run this query:
```
create table t1(i int, j int);
create table t2(i int, j int);
create table t3(i int, j int);
create table t4(i int, j int);
create table t5(i int, j int);
create unique index on t2(i, j);
select t1.j from t1
left join t2 on t1.i = t2.i and t2.j = 1
left join (
select i from t3
where exists (
select true from t4
left join t5 as t5 on t5.j = t4.j
left join t5 as t6 on t6.j = t4.j
where t4.i = t3.i
and t4.i =2
)
) t on t1.j = t.i;
```
This bug was caused by 'rebuild_eclass_attr_needed' function which
did not process EC with constants. The logic behind it is clear - we
can substitute these attributes with constants and do not use them
later, but it is not true when `EXISTS` converts to JOIN. If planner
decided to use Unique + Sort, then the following happens:
1. `t4.i`, `t3.i` and `2` are contained in same equivalence class, so
'ec_has_const' is 'true'
2. During 'rebuild_eclass_attr_needed' their EC is skipped because it
contains constant (src/backend/optimizer/path/equivclass.c:2590)
```
if (list_length(ec->ec_members) > 1 && !ec->ec_has_const)
```
3. 'attr_needed' of 't4.i' is still NULL, so it is not added to 'pathtarget'
of 'RelOptInfo' (src/backend/optimizer/util/relnode.c:1199)
```
if (!bms_nonempty_difference(baserel->attr_needed[ndx], relids))
continue; /* nope, skip it */
```
4. 'UniquePath' is created, but `t4.i` (which must be unique expression)
is not in 'pathtarget' of RelOptInfo, so added to path's targetlist
manually (src/backend/optimizer/plan/planner.c:8395)
```
tle = tlist_member(uniqexpr, newtlist);
if (!tle)
{
tle = makeTargetEntry((Expr *) uniqexpr,
nextresno,
NULL,
false);
newtlist = lappend(newtlist, tle);
}
```
5. When creating a Plan targetlist is just copied from Path
6. At the very end 'set_plan_references' can not find expression at
level below (because it was "not needed" and was not added to
targetlist), so it throws 'variable not found in subplan target lists'
This patch fixes this error by taking considering multi-member EC
even with constants, but skipping constant members during members
iteration.
There are 4 attachments:
- 'schema.sql' - schema for reproducing the bug
- 'query.sql' - actual query to raise error
- '0001-Disable-JOIN_SEMI-and-JOIN_RIGHT_SEMI-paths.patch' - disable
JOIN_SEMI to be able to reproduce the bug
- '0001-fix-variable-not-found-in-subplan-target-lists.patch' - patch
with actual fix of this bug
I would like write a test in 'join.sql', but for now it requires patches
to easily reproduce the bug. I appreciate it if someone could find
an easier way to reproduce the bug and write a simple test.
---
Regards,
Sergey Soloviev
Tantor Labs LLC
Attachments:
0001-Disable-JOIN_SEMI-and-JOIN_RIGHT_SEMI-paths.patchtext/x-patch; charset=UTF-8; name=0001-Disable-JOIN_SEMI-and-JOIN_RIGHT_SEMI-paths.patchDownload
From 9ba6ea7c742b7985b43c8de463b6959f4e16c400 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Fri, 22 Aug 2025 16:43:14 +0300
Subject: [PATCH] Disable 'JOIN_SEMI' and 'JOIN_RIGHT_SEMI' paths
This patch disables creation of SEMI/RIGHT-SEMI paths in order to make
reproducing the bug easier.
---
src/backend/optimizer/path/joinrels.c | 35 +++++++++++++++------------
1 file changed, 19 insertions(+), 16 deletions(-)
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 535248aa525..aded9e11b83 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -972,22 +972,25 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
* then do an innerjoin (see comments in join_is_legal). In the
* latter case we can't apply JOIN_SEMI joining.
*/
- if (bms_is_subset(sjinfo->min_lefthand, rel1->relids) &&
- bms_is_subset(sjinfo->min_righthand, rel2->relids))
- {
- if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
- restriction_is_constant_false(restrictlist, joinrel, false))
- {
- mark_dummy_rel(joinrel);
- break;
- }
- add_paths_to_joinrel(root, joinrel, rel1, rel2,
- JOIN_SEMI, sjinfo,
- restrictlist);
- add_paths_to_joinrel(root, joinrel, rel2, rel1,
- JOIN_RIGHT_SEMI, sjinfo,
- restrictlist);
- }
+
+ /*
+ * if (bms_is_subset(sjinfo->min_lefthand, rel1->relids) &&
+ * bms_is_subset(sjinfo->min_righthand, rel2->relids))
+ * {
+ * if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
+ * restriction_is_constant_false(restrictlist, joinrel, false))
+ * {
+ * mark_dummy_rel(joinrel);
+ * break;
+ * }
+ * add_paths_to_joinrel(root, joinrel, rel1, rel2,
+ * JOIN_SEMI, sjinfo,
+ * restrictlist);
+ * add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ * JOIN_RIGHT_SEMI, sjinfo,
+ * restrictlist);
+ * }
+ */
/*
* If we know how to unique-ify the RHS and one input rel is
--
2.43.0
0001-fix-variable-not-found-in-subplan-target-lists.patchtext/x-patch; charset=UTF-8; name=0001-fix-variable-not-found-in-subplan-target-lists.patchDownload
From 4ebad7d772827fe2f3f34fcf6681d49fe4a829ff Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Fri, 22 Aug 2025 16:37:02 +0300
Subject: [PATCH] fix: variable not found in subplan target lists
When restoring 'attr_needed' after self-join removal we should also
consider EC with constants because some plan nodes can use such constant
attributes.
For example, after converting `EXISTS` to JOIN planner can create
`UniquePath` referencing attribute which is contained in EC with
constant. But because `attr_needed` is NULL then this attribute is not
returned by JOIN, so at `set_plan_references` fails to find this
attribute in targetlist of node below.
---
src/backend/optimizer/path/equivclass.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 441f12f6c50..58ca75ab7a8 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2586,18 +2586,27 @@ rebuild_eclass_attr_needed(PlannerInfo *root)
*/
Assert(ec->ec_childmembers == NULL);
- /* Need do anything only for a multi-member, no-const EC. */
- if (list_length(ec->ec_members) > 1 && !ec->ec_has_const)
+ /*
+ * Need do anything only for a multi-member. Note that some
+ * wrapping plan nodes (i.e. UniquePath) can reference attributes
+ * contained in EC with constants.
+ */
+ if (list_length(ec->ec_members) > 1)
{
ListCell *lc2;
foreach(lc2, ec->ec_members)
{
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
- List *vars = pull_var_clause((Node *) cur_em->em_expr,
- PVC_RECURSE_AGGREGATES |
- PVC_RECURSE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ List *vars;
+
+ if (cur_em->em_is_const)
+ continue;
+
+ vars = pull_var_clause((Node *) cur_em->em_expr,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
add_vars_to_attr_needed(root, vars, ec->ec_relids);
list_free(vars);
--
2.43.0
On Sat, Aug 23, 2025 at 12:27 AM Sergey Soloviev
<sergey.soloviev@tantorlabs.ru> wrote:
I would like write a test in 'join.sql', but for now it requires patches
to easily reproduce the bug. I appreciate it if someone could find
an easier way to reproduce the bug and write a simple test.
Nice catch! Here's a query that reproduces the error without needing
to hack the code.
create table t (a int, b int);
create unique index on t (a);
select t1.a from t t1
left join t t2 on t1.a = t2.a
join t t3 on true
where exists (select 1 from t t4
join t t5 on t4.b = t5.b
join t t6 on t5.b = t6.b
where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
ERROR: variable not found in subplan target lists
This bug was introduced by commit:
commit a3179ab692be4314d5ee5cd56598976c487d5ef2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Sep 27 16:04:04 2024 -0400
Recalculate where-needed data accurately after a join removal.
(I'm a bit surprised it took us so long to discover it.)
When distributing qual clause "t1.a = t4.a", distribute_qual_to_rels
adds t1.a and t4.a that are used in this clause to the targetlists of
their relations. (However, if the clause gets absorbed into an EC
that contains const, this can result in adding Vars to relations that
do not actually require them. But that's a different problem.)
However, when rebuilding attr_needed bits for t1.a and t4.a after the
join removal, rebuild_eclass_attr_needed fails to restore the bits
because it skips ECs that contain constant, as explained by Sergey.
Later, it turns out that t4.a is needed at the join leval t4/t5/t6 for
the unique-ification of the RHS of the semi-join.
The proposed patch can fix this error. However, I'm wondering if we
could address it from the unique-ification side instead. If a Var
we're trying to unique-ify is known to be equal to a constant, then we
shouldn't need to unique-ify that Var -- and if it's not needed at
upper levels, it shouldn't need to be in the targetlist of the unique
path. For example, in the query above, t4.a does not need to be added
in the targetlist of the unique path, right?
Thanks
Richard
On 25/8/2025 15:28, Richard Guo wrote:
On Sat, Aug 23, 2025 at 12:27 AM Sergey Soloviev
<sergey.soloviev@tantorlabs.ru> wrote:I would like write a test in 'join.sql', but for now it requires patches
to easily reproduce the bug. I appreciate it if someone could find
an easier way to reproduce the bug and write a simple test.Nice catch! Here's a query that reproduces the error without needing
to hack the code.create table t (a int, b int);
create unique index on t (a);select t1.a from t t1
left join t t2 on t1.a = t2.a
join t t3 on true
where exists (select 1 from t t4
join t t5 on t4.b = t5.b
join t t6 on t5.b = t6.b
where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
ERROR: variable not found in subplan target lists
Thanks for your reproduction.
Unfortunately, it works only in the absence of an ANALYZE, like the
original example.
Also, I would say it is not a self-join-related issue. This example
employs the removal of the 'unnecessary left join'. Currently, I'm
unsure why this example causes the issue: the removing t2 table
shouldn't have any references in ECs within the EXISTS part.
--
regards, Andrei Lepikhov
25.08.2025 16:59, Andrei Lepikhov:
On 25/8/2025 15:28, Richard Guo wrote:
On Sat, Aug 23, 2025 at 12:27 AM Sergey Soloviev
<sergey.soloviev@tantorlabs.ru> wrote:I would like write a test in 'join.sql', but for now it requires patches
to easily reproduce the bug. I appreciate it if someone could find
an easier way to reproduce the bug and write a simple test.Nice catch! Here's a query that reproduces the error without needing
to hack the code.create table t (a int, b int);
create unique index on t (a);select t1.a from t t1
left join t t2 on t1.a = t2.a
join t t3 on true
where exists (select 1 from t t4
join t t5 on t4.b = t5.b
join t t6 on t5.b = t6.b
where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
ERROR: variable not found in subplan target listsThanks for your reproduction.
Unfortunately, it works only in the absence of an ANALYZE, like the original example.
Also, I would say it is not a self-join-related issue. This example employs the removal of the 'unnecessary left join'. Currently, I'm unsure why this example causes the issue: the removing t2 table shouldn't have any references in ECs within the EXISTS part.
Hi!
Yes, this is not created by SJE, but this bug introduced by commit adding SJE logic:
first remove any 'attr_needed' (and other info) and then restore it according
to only needed relations.
Provided example shows bug in the code.
'attr_needed' is cleared at src/backend/optimizer/plan/analyzejoins.c:526.
If we dump the state for relation t4, then we will get
attr_needed[a] = {1, 6} /* {t1, t4} */
And also, there is EC = {t1.a, t4.a, 2}. This comes from WHERE in EXISTS:
t1.a = t4.a AND t4.a = 2
But during the second phase (recreating 'attr_needed') we see that EC contains
constant (2), so skip recreating 'attr_needed[a]' for t4, but it previously had t1
in 'attr_needed' which was not removed by join elimination logic. Roughly
speaking, we have lost dependency with t1.
Thus, error is caused not by removing t2 itself, but by the manipulations
involved.
---
Regards,
Sergey Solviev
Tantor Labs LLC
Richard Guo <guofenglinux@gmail.com> writes:
The proposed patch can fix this error. However, I'm wondering if we
could address it from the unique-ification side instead. If a Var
we're trying to unique-ify is known to be equal to a constant, then we
shouldn't need to unique-ify that Var
Yeah. I think this is an oversight in create_unique_paths(): it's
building an ORDER BY list without consideration for the possibility
that some of the entries are known to be constant. In fact, because
make_pathkeys_for_sortclauses will get rid of redundancies, this
example actually ends up with a unique_rel whose unique_pathkeys
are shorter than the unique_groupclause, which is pretty bogus.
Not sure about a good way to make it account for that though.
Detection of redundancies of this sort is kind of buried in
pathkey construction, but in this context we'd like to find out
earlier: we need to avoid attaching a new tlist entry if the
expression is constant, else we'll still get the failure.
regards, tom lane
25.08.2025 16:28, Richard Guo пишет:
On Sat, Aug 23, 2025 at 12:27 AM Sergey Soloviev
<sergey.soloviev@tantorlabs.ru> wrote:I would like write a test in 'join.sql', but for now it requires patches
to easily reproduce the bug. I appreciate it if someone could find
an easier way to reproduce the bug and write a simple test.Nice catch! Here's a query that reproduces the error without needing
to hack the code.create table t (a int, b int);
create unique index on t (a);select t1.a from t t1
left join t t2 on t1.a = t2.a
join t t3 on true
where exists (select 1 from t t4
join t t5 on t4.b = t5.b
join t t6 on t5.b = t6.b
where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
ERROR: variable not found in subplan target listsThis bug was introduced by commit:
commit a3179ab692be4314d5ee5cd56598976c487d5ef2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Sep 27 16:04:04 2024 -0400Recalculate where-needed data accurately after a join removal.
(I'm a bit surprised it took us so long to discover it.)
When distributing qual clause "t1.a = t4.a", distribute_qual_to_rels
adds t1.a and t4.a that are used in this clause to the targetlists of
their relations. (However, if the clause gets absorbed into an EC
that contains const, this can result in adding Vars to relations that
do not actually require them. But that's a different problem.)However, when rebuilding attr_needed bits for t1.a and t4.a after the
join removal, rebuild_eclass_attr_needed fails to restore the bits
because it skips ECs that contain constant, as explained by Sergey.
Later, it turns out that t4.a is needed at the join leval t4/t5/t6 for
the unique-ification of the RHS of the semi-join.The proposed patch can fix this error. However, I'm wondering if we
could address it from the unique-ification side instead. If a Var
we're trying to unique-ify is known to be equal to a constant, then we
shouldn't need to unique-ify that Var -- and if it's not needed at
upper levels, it shouldn't need to be in the targetlist of the unique
path. For example, in the query above, t4.a does not need to be added
in the targetlist of the unique path, right?Thanks
Richard
Hi, Richard!
Thanks for finding this example. I have added it to join.sql regress test.
New patch version is attached.
I also thought about case to just check Var to be a constant. But
the main question is 'if it is the only node affected?'. Example shows
that error is caused by Unique path, but maybe there are another
nodes which will cause such error.
Yes only 'create_unique_paths' creates new TargetEntry. But I think
the root cause is that when recreating 'attr_needed', the dependency
between this relation and some other relation is lost.
---
Regards,
Sergey Soloviev
Tantor Labs LLC
Attachments:
0002-fix-variable-not-found-in-subplan-target-lists.patchtext/x-patch; charset=UTF-8; name=0002-fix-variable-not-found-in-subplan-target-lists.patchDownload
From dad22e812b3147e159f0203a41e9bf042a1d25d4 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Mon, 25 Aug 2025 19:53:39 +0300
Subject: [PATCH] fix: variable not found in subplan target lists
When restoring 'attr_needed' after self-join removal we should also
consider EC with constants because some plan nodes can use such constant
attributes.
For example, after converting `EXISTS` to JOIN planner can create
`UniquePath` referencing attribute which is contained in EC with
constant. But because `attr_needed` is NULL then this attribute is not
returned by JOIN, so at `set_plan_references` fails to find this
attribute in targetlist of node below.
---
src/backend/optimizer/path/equivclass.c | 21 +++++++----
src/test/regress/expected/join.out | 47 +++++++++++++++++++++++++
src/test/regress/sql/join.sql | 26 ++++++++++++++
3 files changed, 88 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 441f12f6c50..58ca75ab7a8 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2586,18 +2586,27 @@ rebuild_eclass_attr_needed(PlannerInfo *root)
*/
Assert(ec->ec_childmembers == NULL);
- /* Need do anything only for a multi-member, no-const EC. */
- if (list_length(ec->ec_members) > 1 && !ec->ec_has_const)
+ /*
+ * Need do anything only for a multi-member. Note that some
+ * wrapping plan nodes (i.e. UniquePath) can reference attributes
+ * contained in EC with constants.
+ */
+ if (list_length(ec->ec_members) > 1)
{
ListCell *lc2;
foreach(lc2, ec->ec_members)
{
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
- List *vars = pull_var_clause((Node *) cur_em->em_expr,
- PVC_RECURSE_AGGREGATES |
- PVC_RECURSE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ List *vars;
+
+ if (cur_em->em_is_const)
+ continue;
+
+ vars = pull_var_clause((Node *) cur_em->em_expr,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
add_vars_to_attr_needed(root, vars, ec->ec_relids);
list_free(vars);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 98b05c94a11..71654f528b1 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6500,6 +6500,53 @@ where t1.a = s.c;
----------
(0 rows)
+rollback;
+-- yet another join removal bug: EquivalenceClass cleanup must preserve
+-- dependency with JOINs above even if EC has constant
+begin;
+create table t (a int, b int);
+create unique index on t (a);
+explain (costs off)
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Nested Loop
+ -> Nested Loop
+ -> Index Only Scan using t_a_idx on t t1
+ Index Cond: (a = 2)
+ -> HashAggregate
+ Group Key: t4.a, t5.a
+ -> Hash Join
+ Hash Cond: (t6.b = t4.b)
+ -> Seq Scan on t t6
+ -> Hash
+ -> Hash Join
+ Hash Cond: (t5.b = t4.b)
+ -> Seq Scan on t t5
+ -> Hash
+ -> Index Scan using t_a_idx on t t4
+ Index Cond: (a = 2)
+ -> Index Only Scan using t_a_idx on t t3
+ Index Cond: (a = t5.a)
+(18 rows)
+
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
+ a
+---
+(0 rows)
+
rollback;
-- test cases where we can remove a join, but not a PHV computed at it
begin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5f0a475894d..d62b4abd817 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2420,6 +2420,32 @@ where t1.a = s.c;
rollback;
+-- yet another join removal bug: EquivalenceClass cleanup must preserve
+-- dependency with JOINs above even if EC has constant
+begin;
+
+create table t (a int, b int);
+create unique index on t (a);
+
+explain (costs off)
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
+
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
+
+rollback;
+
-- test cases where we can remove a join, but not a PHV computed at it
begin;
--
2.43.0
I wrote:
Yeah. I think this is an oversight in create_unique_paths(): it's
building an ORDER BY list without consideration for the possibility
that some of the entries are known to be constant. In fact, because
make_pathkeys_for_sortclauses will get rid of redundancies, this
example actually ends up with a unique_rel whose unique_pathkeys
are shorter than the unique_groupclause, which is pretty bogus.
Here is a patch that fixes it that way. I like this better than
Sergey's approach because it is making the plans better not worse.
There are a couple loose ends yet to be dealt with:
* The fact that we now avoid duplicate unique-ifications is good,
but I think it breaks the test case added by 44e95b572, since
that's no longer demonstrating what it set out to demonstrate.
(See the delta in aggregates.out.) I'm not sure that that's a
huge problem, but we probably at least ought to modify the comment
on that test case.
* What happens if we find that *all* the semi_rhs_exprs are known
constant? We'll compute empty pathkeys and groupClause, but then
what? It looks like create_final_unique_paths etc then build a
useless Unique step. Maybe we should just absorb the input
relation's path list as-is?
regards, tom lane
Attachments:
v3-fix-variable-not-found-in-subplan-target-lists.patchtext/x-diff; charset=us-ascii; name=v3-fix-variable-not-found-in-subplan-target-lists.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 65f17101591..bb03ea7ccb4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -8284,8 +8284,8 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
* the needs of the semijoin represented by sjinfo. If it is not possible
* to identify how to make the data unique, NULL is returned.
*
- * If used at all, this is likely to be called repeatedly on the same rel;
- * So we cache the result.
+ * If used at all, this is likely to be called repeatedly on the same rel,
+ * so we cache the result.
*/
RelOptInfo *
create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
@@ -8378,6 +8378,15 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
* While in the loop, build the lists of SortGroupClause's that
* represent the ordering for the sort-based implementation and the
* grouping for the hash-based implementation.
+ *
+ * To complicate matters, some of the values to be unique-ified may be
+ * known redundant by the EquivalenceClass machinery (e.g., because
+ * they have been equated to constants). There is no need to compare
+ * such values during unique-ification, and indeed we had better not
+ * try because the Vars involved may not have propagated as high as
+ * the semijoin's level. We use make_pathkeys_for_sortclauses to
+ * detect such cases, which is a tad inefficient but it doesn't seem
+ * worth building specialized infrastructure for this.
*/
newtlist = make_tlist_from_pathtarget(rel->reltarget);
nextresno = list_length(newtlist) + 1;
@@ -8386,8 +8395,9 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
{
Expr *uniqexpr = lfirst(lc1);
Oid in_oper = lfirst_oid(lc2);
- Oid sortop = InvalidOid;
+ Oid sortop;
TargetEntry *tle;
+ bool made_tle = false;
tle = tlist_member(uniqexpr, newtlist);
if (!tle)
@@ -8398,19 +8408,21 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
false);
newtlist = lappend(newtlist, tle);
nextresno++;
+ made_tle = true;
}
- if (sjinfo->semi_can_btree)
+ /*
+ * Try to build an ORDER BY list to sort the input compatibly. We
+ * do this for each sortable clause even when the clauses are not
+ * all sortable, so that we can detect clauses that are redundant
+ * according to the pathkey machinery.
+ */
+ sortop = get_ordering_op_for_equality_op(in_oper, false);
+ if (OidIsValid(sortop))
{
- /* Create an ORDER BY list to sort the input compatibly */
Oid eqop;
SortGroupClause *sortcl;
- sortop = get_ordering_op_for_equality_op(in_oper, false);
- if (!OidIsValid(sortop)) /* shouldn't happen */
- elog(ERROR, "could not find ordering operator for equality operator %u",
- in_oper);
-
/*
* The Unique node will need equality operators. Normally
* these are the same as the IN clause operators, but if those
@@ -8430,7 +8442,32 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
sortcl->nulls_first = false;
sortcl->hashable = false; /* no need to make this accurate */
sortList = lappend(sortList, sortcl);
+
+ /*
+ * At each step, convert the SortGroupClause list to pathkey
+ * form. If the just-added SortGroupClause is redundant, the
+ * result will be shorter than the SortGroupClause list.
+ */
+ sortPathkeys = make_pathkeys_for_sortclauses(root, sortList,
+ newtlist);
+ if (list_length(sortPathkeys) != list_length(sortList))
+ {
+ /* Drop the redundant SortGroupClause */
+ sortList = list_delete_last(sortList);
+ Assert(list_length(sortPathkeys) == list_length(sortList));
+ /* Undo tlist addition, if we made one */
+ if (made_tle)
+ {
+ newtlist = list_delete_last(newtlist);
+ nextresno--;
+ }
+ /* We need not consider this clause for hashing, either */
+ continue;
+ }
}
+ else if (sjinfo->semi_can_btree) /* shouldn't happen */
+ elog(ERROR, "could not find ordering operator for equality operator %u",
+ in_oper);
if (sjinfo->semi_can_hash)
{
@@ -8460,8 +8497,14 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
}
}
+ /*
+ * Done building the sortPathkeys and groupClause. But the
+ * sortPathkeys are bogus if not all the clauses were sortable.
+ */
+ if (!sjinfo->semi_can_btree)
+ sortPathkeys = NIL;
+
unique_rel->reltarget = create_pathtarget(root, newtlist);
- sortPathkeys = make_pathkeys_for_sortclauses(root, sortList, newtlist);
}
/* build unique paths based on input rel's pathlist */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 7319945ffe3..46cff08b748 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3405,15 +3405,15 @@ set enable_memoize to off;
explain (costs off)
select 1 from tenk1
where (hundred, thousand) in (select twothousand, twothousand from onek);
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------
Hash Join
Hash Cond: (tenk1.hundred = onek.twothousand)
-> Seq Scan on tenk1
Filter: (hundred = thousand)
-> Hash
-> HashAggregate
- Group Key: onek.twothousand, onek.twothousand
+ Group Key: onek.twothousand
-> Seq Scan on onek
(8 rows)
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 98b05c94a11..054bcb20596 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6500,6 +6500,68 @@ where t1.a = s.c;
----------
(0 rows)
+rollback;
+-- check handling of semijoins after join removal: we must suppress
+-- unique-ification of known-constant values
+begin;
+create temp table t (a int unique, b int);
+insert into t values (1, 2);
+explain (verbose, costs off)
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a
+ Inner Unique: true
+ -> Nested Loop
+ Output: t1.a, t5.a
+ -> Index Only Scan using t_a_key on pg_temp.t t1
+ Output: t1.a
+ Index Cond: (t1.a = 1)
+ -> HashAggregate
+ Output: t5.a
+ Group Key: t5.a
+ -> Hash Join
+ Output: t5.a
+ Hash Cond: (t6.b = t4.b)
+ -> Seq Scan on pg_temp.t t6
+ Output: t6.a, t6.b
+ -> Hash
+ Output: t4.b, t5.b, t5.a
+ -> Hash Join
+ Output: t4.b, t5.b, t5.a
+ Inner Unique: true
+ Hash Cond: (t5.b = t4.b)
+ -> Seq Scan on pg_temp.t t5
+ Output: t5.a, t5.b
+ -> Hash
+ Output: t4.b, t4.a
+ -> Index Scan using t_a_key on pg_temp.t t4
+ Output: t4.b, t4.a
+ Index Cond: (t4.a = 1)
+ -> Index Only Scan using t_a_key on pg_temp.t t3
+ Output: t3.a
+ Index Cond: (t3.a = t5.a)
+(32 rows)
+
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+ a
+---
+ 1
+(1 row)
+
rollback;
-- test cases where we can remove a join, but not a PHV computed at it
begin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5f0a475894d..817132d90de 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2420,6 +2420,32 @@ where t1.a = s.c;
rollback;
+-- check handling of semijoins after join removal: we must suppress
+-- unique-ification of known-constant values
+begin;
+
+create temp table t (a int unique, b int);
+insert into t values (1, 2);
+
+explain (verbose, costs off)
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+
+rollback;
+
-- test cases where we can remove a join, but not a PHV computed at it
begin;
BTW, a couple of thoughts came to me after considering this
issue awhile longer:
1. The really fundamental problem is that we don't update
SpecialJoinInfo's semi_operators/semi_rhs_exprs after discovering that
join-level comparisons of a particular value are unnecessary. We will
then not emit the actual join clause ("t1.a = t4.a" in this example),
but we still list t4.a as something that would need unique-ification.
I looked a little bit at whether it'd be reasonable to do that at the
end of construction of EquivalenceClasses, but I think it'd add a lot
more planning cycles than my v3 patch. A more invasive idea could be
to not compute semi_operators/semi_rhs_exprs at all until we've
finished building EquivalenceClasses. I'm not sure if that'd be
adequately cheap either, but maybe it would be.
2. Another thing that's a bit worrisome is the recognition that
a3179ab69's logic for reconstruction of attr_needed might produce
different results than we had to begin with. It's not necessarily
worse: in the particular case we're considering here, the results
are arguably better. But it's scary to think that if there are
any other bugs in that code, we will only find them in queries that
use join removal. That's not a recipe for thorough test coverage.
I wonder if it could be sane to delete the existing logic that builds
attr_needed during initial jointree deconstruction, and instead
always fill attr_needed by running the new "reconstruction" logic.
The trouble with that is we'd have to do it just before starting
join removal, and then again after each successful join removal,
since join removal itself depends on the attr_needed results.
It seems likely that that would net out slower than the current
code. But maybe the difference would be small.
I'm not planning to work on either of these ideas right now,
but I thought I'd put them out there in case someone else is
interested.
regards, tom lane
On Tue, Aug 26, 2025 at 4:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Here is a patch that fixes it that way. I like this better than
Sergey's approach because it is making the plans better not worse.There are a couple loose ends yet to be dealt with:
* The fact that we now avoid duplicate unique-ifications is good,
but I think it breaks the test case added by 44e95b572, since
that's no longer demonstrating what it set out to demonstrate.
(See the delta in aggregates.out.) I'm not sure that that's a
huge problem, but we probably at least ought to modify the comment
on that test case.* What happens if we find that *all* the semi_rhs_exprs are known
constant? We'll compute empty pathkeys and groupClause, but then
what? It looks like create_final_unique_paths etc then build a
useless Unique step. Maybe we should just absorb the input
relation's path list as-is?
Yeah, I think this is a better approach.
Instead of repeatedly calling make_pathkeys_for_sortclauses to detect
redundant expressions, I'm wondering if we could introduce a function
that detects whether a given expression is known equal to a constant
by the EquivalenceClass machinery. This function should not be too
costly, as we'd only need to examine ECs that contain pseudoconstants.
We could then use this function to remove expressions that are known
constant from semi_rhs_exprs. And if we find that all expressions
in semi_rhs_exprs are known constant (the second loose end you
mentioned), we can give up building unique paths and fall back to a
traditional JOIN_SEMI.
To be concrete, I'm imagining something like the attached patch.
Currently, it doesn't update the modified semi_rhs_exprs and
semi_operators back to SpecialJoinInfo, but that shouldn't be hard to
add.
One limitation is that the patch doesn't try to detect grouping
expressions that are known equal to each other, so you may still see
redundant grouping expressions, such as in the test case added by
44e95b572. I'm not sure if that's a big issue.
Thanks
Richard
Attachments:
v4-0001-fix-variable-not-found-in-subplan-target-lists.patchapplication/octet-stream; name=v4-0001-fix-variable-not-found-in-subplan-target-lists.patchDownload
From 19d091c74a8a37f6572db3e13f04b222b284e906 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 26 Aug 2025 12:50:27 +0900
Subject: [PATCH v4] fix variable not found in subplan target lists
---
src/backend/optimizer/path/equivclass.c | 87 +++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 49 +++++++++++---
src/include/optimizer/paths.h | 2 +
src/test/regress/expected/join.out | 62 ++++++++++++++++++
src/test/regress/sql/join.sql | 26 ++++++++
5 files changed, 218 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 441f12f6c50..bebbc914618 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2690,6 +2690,93 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
return false;
}
+/*
+ * expr_known_equal_to_constant
+ * Detect whether the given expression is known equal to a constant due to
+ * equivalence relationships.
+ *
+ * "sortop" is the OID of the ordering operator.
+ */
+bool
+expr_known_equal_to_constant(PlannerInfo *root, Expr *expr, Oid sortop)
+{
+ Oid opfamily,
+ opcintype,
+ collation;
+ CompareType cmptype;
+ Oid equality_op;
+ List *opfamilies;
+ ListCell *lc;
+
+ /* Find the operator in pg_amop --- failure shouldn't happen */
+ if (!get_ordering_op_properties(sortop,
+ &opfamily, &opcintype, &cmptype))
+ elog(ERROR, "operator %u is not a valid ordering operator",
+ sortop);
+
+ collation = exprCollation((Node *) expr);
+
+ /*
+ * EquivalenceClasses need to contain opfamily lists based on the family
+ * membership of mergejoinable equality operators, which could belong to
+ * more than one opfamily. So we have to look up the opfamily's equality
+ * operator and get its membership.
+ */
+ equality_op = get_opfamily_member_for_cmptype(opfamily,
+ opcintype,
+ opcintype,
+ COMPARE_EQ);
+ if (!OidIsValid(equality_op)) /* shouldn't happen */
+ elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
+ COMPARE_EQ, opcintype, opcintype, opfamily);
+ opfamilies = get_mergejoin_opfamilies(equality_op);
+ if (!opfamilies) /* certainly should find some */
+ elog(ERROR, "could not find opfamilies for equality operator %u",
+ equality_op);
+
+ /*
+ * Ensure the expression exposes the correct type and collation.
+ */
+ expr = canonicalize_ec_expression(expr, opcintype, collation);
+
+ /*
+ * Scan through the existing EquivalenceClasses for a match
+ */
+ foreach(lc, root->eq_classes)
+ {
+ EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc);
+ EquivalenceMemberIterator it;
+ EquivalenceMember *cur_em;
+
+ /* Ignore EC unless it contains pseudoconstants */
+ if (!cur_ec->ec_has_const)
+ continue;
+
+ /* Never match to a volatile EC */
+ if (cur_ec->ec_has_volatile)
+ continue;
+
+ if (collation != cur_ec->ec_collation)
+ continue;
+ if (!equal(opfamilies, cur_ec->ec_opfamilies))
+ continue;
+
+ setup_eclass_member_iterator(&it, cur_ec, NULL);
+ while ((cur_em = eclass_member_iterator_next(&it)) != NULL)
+ {
+ /* Ignore child members */
+ if (cur_em->em_is_child)
+ continue;
+
+ if (opcintype == cur_em->em_datatype &&
+ equal(expr, cur_em->em_expr))
+ return true; /* Match! */
+ }
+ }
+
+ return false;
+}
+
/*
* match_eclasses_to_foreign_key_col
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 65f17101591..1ac9cc869f8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -8284,8 +8284,8 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
* the needs of the semijoin represented by sjinfo. If it is not possible
* to identify how to make the data unique, NULL is returned.
*
- * If used at all, this is likely to be called repeatedly on the same rel;
- * So we cache the result.
+ * If used at all, this is likely to be called repeatedly on the same rel,
+ * so we cache the result.
*/
RelOptInfo *
create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
@@ -8294,6 +8294,10 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
List *sortPathkeys = NIL;
List *groupClause = NIL;
MemoryContext oldcontext;
+ List *semi_rhs_exprs = NIL;
+ List *semi_operators = NIL;
+ ListCell *lc1;
+ ListCell *lc2;
/* Caller made a mistake if SpecialJoinInfo is the wrong one */
Assert(sjinfo->jointype == JOIN_SEMI);
@@ -8307,6 +8311,39 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
return NULL;
+ /*
+ * To complicate matters, some of the values to be unique-ified may be
+ * known redundant by the EquivalenceClass machinery (e.g., because they
+ * have been equated to constants). There is no need to compare such
+ * values during unique-ification, and indeed we had better not try
+ * because the Vars involved may not have propagated as high as the
+ * semijoin's level.
+ */
+ forboth(lc1, sjinfo->semi_rhs_exprs, lc2, sjinfo->semi_operators)
+ {
+ Expr *uniqexpr = lfirst(lc1);
+ Oid in_oper = lfirst_oid(lc2);
+ Oid sortop;
+
+ sortop = get_ordering_op_for_equality_op(in_oper, false);
+
+ if (OidIsValid(sortop))
+ {
+ if (expr_known_equal_to_constant(root, uniqexpr, sortop))
+ continue;
+ }
+ else if (sjinfo->semi_can_btree) /* shouldn't happen */
+ elog(ERROR, "could not find ordering operator for equality operator %u",
+ in_oper);
+
+ semi_rhs_exprs = lappend(semi_rhs_exprs, uniqexpr);
+ semi_operators = lappend_oid(semi_operators, in_oper);
+ }
+
+ /* If there are no columns left to unique-ify, return NULL. */
+ if (semi_rhs_exprs == NIL)
+ return NULL;
+
/*
* When called during GEQO join planning, we are in a short-lived memory
* context. We must make sure that the unique rel and any subsidiary data
@@ -8367,8 +8404,6 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
List *newtlist;
int nextresno;
List *sortList = NIL;
- ListCell *lc1;
- ListCell *lc2;
/*
* The values we are supposed to unique-ify may be expressions in the
@@ -8382,7 +8417,7 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
newtlist = make_tlist_from_pathtarget(rel->reltarget);
nextresno = list_length(newtlist) + 1;
- forboth(lc1, sjinfo->semi_rhs_exprs, lc2, sjinfo->semi_operators)
+ forboth(lc1, semi_rhs_exprs, lc2, semi_operators)
{
Expr *uniqexpr = lfirst(lc1);
Oid in_oper = lfirst_oid(lc2);
@@ -8407,9 +8442,7 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
SortGroupClause *sortcl;
sortop = get_ordering_op_for_equality_op(in_oper, false);
- if (!OidIsValid(sortop)) /* shouldn't happen */
- elog(ERROR, "could not find ordering operator for equality operator %u",
- in_oper);
+ Assert(OidIsValid(sortop));
/*
* The Unique node will need equality operators. Normally
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cbade77b717..db47bb88eb1 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -159,6 +159,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
RelOptInfo *inner_rel);
extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
Oid opfamily);
+extern bool expr_known_equal_to_constant(PlannerInfo *root, Expr *expr,
+ Oid sortop);
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
ForeignKeyOptInfo *fkinfo,
int colno);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 98b05c94a11..054bcb20596 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6500,6 +6500,68 @@ where t1.a = s.c;
----------
(0 rows)
+rollback;
+-- check handling of semijoins after join removal: we must suppress
+-- unique-ification of known-constant values
+begin;
+create temp table t (a int unique, b int);
+insert into t values (1, 2);
+explain (verbose, costs off)
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a
+ Inner Unique: true
+ -> Nested Loop
+ Output: t1.a, t5.a
+ -> Index Only Scan using t_a_key on pg_temp.t t1
+ Output: t1.a
+ Index Cond: (t1.a = 1)
+ -> HashAggregate
+ Output: t5.a
+ Group Key: t5.a
+ -> Hash Join
+ Output: t5.a
+ Hash Cond: (t6.b = t4.b)
+ -> Seq Scan on pg_temp.t t6
+ Output: t6.a, t6.b
+ -> Hash
+ Output: t4.b, t5.b, t5.a
+ -> Hash Join
+ Output: t4.b, t5.b, t5.a
+ Inner Unique: true
+ Hash Cond: (t5.b = t4.b)
+ -> Seq Scan on pg_temp.t t5
+ Output: t5.a, t5.b
+ -> Hash
+ Output: t4.b, t4.a
+ -> Index Scan using t_a_key on pg_temp.t t4
+ Output: t4.b, t4.a
+ Index Cond: (t4.a = 1)
+ -> Index Only Scan using t_a_key on pg_temp.t t3
+ Output: t3.a
+ Index Cond: (t3.a = t5.a)
+(32 rows)
+
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+ a
+---
+ 1
+(1 row)
+
rollback;
-- test cases where we can remove a join, but not a PHV computed at it
begin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5f0a475894d..817132d90de 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2420,6 +2420,32 @@ where t1.a = s.c;
rollback;
+-- check handling of semijoins after join removal: we must suppress
+-- unique-ification of known-constant values
+begin;
+
+create temp table t (a int unique, b int);
+insert into t values (1, 2);
+
+explain (verbose, costs off)
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+
+rollback;
+
-- test cases where we can remove a join, but not a PHV computed at it
begin;
--
2.43.0
On Tue, Aug 26, 2025 at 4:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
Yeah. I think this is an oversight in create_unique_paths(): it's
building an ORDER BY list without consideration for the possibility
that some of the entries are known to be constant. In fact, because
make_pathkeys_for_sortclauses will get rid of redundancies, this
example actually ends up with a unique_rel whose unique_pathkeys
are shorter than the unique_groupclause, which is pretty bogus.Here is a patch that fixes it that way. I like this better than
Sergey's approach because it is making the plans better not worse.
Another question we need to consider is how to fix this error in v18,
where it seems we'll need to remove redundant expressions during
createplan.c.
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
Instead of repeatedly calling make_pathkeys_for_sortclauses to detect
redundant expressions, I'm wondering if we could introduce a function
that detects whether a given expression is known equal to a constant
by the EquivalenceClass machinery. This function should not be too
costly, as we'd only need to examine ECs that contain pseudoconstants.
I did consider that, but rejected it. I kind of like the fact that
the v3 patch gets rid of duplicated columns-to-be-made-unique.
Yes, repeatedly doing make_pathkeys_for_sortclauses is a bit grotty,
because it's O(N^2) in the number of columns-to-be-made-unique ...
but realistically, how often is that ever more than a couple?
In the common case where the semijoin has only one join column,
the patch adds basically zero work, which is nice too.
We could then use this function to remove expressions that are known
constant from semi_rhs_exprs. And if we find that all expressions
in semi_rhs_exprs are known constant (the second loose end you
mentioned), we can give up building unique paths and fall back to a
traditional JOIN_SEMI.
Yeah, I was thinking we could just use the paths of the existing
rel, but really this case means that we'd need to de-dup down
to a single row. We could maybe do something involving plastering
LIMIT 1 on top of each input path, but it's such a weird and
probably-useless-in-the-real-world case that I don't think it's
worth expending effort on. Failing outright seems fine.
Another question we need to consider is how to fix this error in v18,
where it seems we'll need to remove redundant expressions during
createplan.c.
Ugh ... I forgot you just refactored all that code.
I wonder if the right answer in v18 is to back out a3179ab69.
Not fixing that performance regression for another year would
be mildly annoying; but AFAIR we've still had only the one
complaint, so it seems like it's not hurting many people.
And we are getting mighty late in the release calendar to be
inventing new stuff in v18.
regards, tom lane
On 8/26/25 20:26, Tom Lane wrote:
Richard Guo <guofenglinux@gmail.com> writes:
We could then use this function to remove expressions that are known
constant from semi_rhs_exprs. And if we find that all expressions
in semi_rhs_exprs are known constant (the second loose end you
mentioned), we can give up building unique paths and fall back to a
traditional JOIN_SEMI.Yeah, I was thinking we could just use the paths of the existing
rel, but really this case means that we'd need to de-dup down
to a single row. We could maybe do something involving plastering
LIMIT 1 on top of each input path
Unique node over incoming empty rows seems to act as LIMIT 1, in
particular, it breaks subplan execution after extracting the first row.
Then I see v3 patch covers this case perfectly.
--
Best regard,
Maksim Milyutin
On Wed, Aug 27, 2025 at 2:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
Instead of repeatedly calling make_pathkeys_for_sortclauses to detect
redundant expressions, I'm wondering if we could introduce a function
that detects whether a given expression is known equal to a constant
by the EquivalenceClass machinery. This function should not be too
costly, as we'd only need to examine ECs that contain pseudoconstants.
I did consider that, but rejected it. I kind of like the fact that
the v3 patch gets rid of duplicated columns-to-be-made-unique.
Yes, repeatedly doing make_pathkeys_for_sortclauses is a bit grotty,
because it's O(N^2) in the number of columns-to-be-made-unique ...
but realistically, how often is that ever more than a couple?
In the common case where the semijoin has only one join column,
the patch adds basically zero work, which is nice too.
Fair point. I'm now inclined to agree that the v3 patch is the better
approach, as it removes duplicate grouping expressions.
I looked into the first loose end you mentioned earlier and tried to
write a query that would produce duplicate hash keys for HashAggregate
but failed. After checking all the callers of create_agg_path(), it
seems that we now eliminate duplicate grouping keys in all cases.
That commit mentioned the possibility that duplicate columns might
have different hash functions assigned, so we may still need the
changes in the executor. But we need to update the test case with a
new query that produces duplicate hash keys, or remove it if we cannot
find one.
On the other hand, we may want to add a similar query in join.sql to
show that duplicate grouping expressions can be removed during
unique-ification.
We could then use this function to remove expressions that are known
constant from semi_rhs_exprs. And if we find that all expressions
in semi_rhs_exprs are known constant (the second loose end you
mentioned), we can give up building unique paths and fall back to a
traditional JOIN_SEMI.
Yeah, I was thinking we could just use the paths of the existing
rel, but really this case means that we'd need to de-dup down
to a single row. We could maybe do something involving plastering
LIMIT 1 on top of each input path, but it's such a weird and
probably-useless-in-the-real-world case that I don't think it's
worth expending effort on. Failing outright seems fine.
Yeah. When computing semi_rhs_exprs in compute_semijoin_info(), we
punt if we find that there are no columns to unique-ify.
/* Punt if we didn't find at least one column to unique-ify */
if (semi_rhs_exprs == NIL)
return;
I think the case we're discussing here -- where all semi_rhs_exprs are
known constants -- is essentially the same in that it leaves us no
columns to unique-ify. So it should be fine to just punt in this case
as well.
So I think we may need to add the following changes:
+ /* If there are no columns left to unique-ify, return NULL. */
+ if (sortPathkeys == NIL && groupClause == NIL)
+ {
+ MemoryContextSwitchTo(oldcontext);
+ return NULL;
+ }
+
/* build unique paths based on input rel's pathlist */
Another question we need to consider is how to fix this error in v18,
where it seems we'll need to remove redundant expressions during
createplan.c.
Ugh ... I forgot you just refactored all that code.
I wonder if the right answer in v18 is to back out a3179ab69.
Not fixing that performance regression for another year would
be mildly annoying; but AFAIR we've still had only the one
complaint, so it seems like it's not hurting many people.
And we are getting mighty late in the release calendar to be
inventing new stuff in v18.
Yeah, inventing new stuff in v18 at this point seems risky. I think
backing out a3179ab69 should be fine for v18.
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
On Wed, Aug 27, 2025 at 2:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wonder if the right answer in v18 is to back out a3179ab69.
Not fixing that performance regression for another year would
be mildly annoying; but AFAIR we've still had only the one
complaint, so it seems like it's not hurting many people.
And we are getting mighty late in the release calendar to be
inventing new stuff in v18.
Yeah, inventing new stuff in v18 at this point seems risky. I think
backing out a3179ab69 should be fine for v18.
I had a go at doing that (attached) and the news is not good:
it dumps core in some regression tests added by the SJE patch,
seemingly because there are still references to the removed relid.
Now it might be that I failed to correctly adjust the places
where the revert didn't apply because of SJE-related changes.
But I think what is more likely is that the SJE code is not
bothering to update relids everywhere because it's expecting
a3179ab69's "rebuild" logic to take care of things later.
This leaves us with some pretty unappetizing choices about what
to do in v18:
1. Try to emulate the proposed HEAD fix.
2. Try to fix up the SJE patch so that it calculates relid changes
honestly, or at least no less honestly than what happened before
a3179ab69.
3. Revert SJE as well as a3179ab69 in v18.
I will have a look at #1. If there's somebody who wants to look
at #2, feel free, but it won't be me because I don't understand
the SJE patch well enough. Either way, it's not great to be
doing stuff like this just days before rc1 ...
regards, tom lane
Attachments:
revert-a3179ab69-doesnt-work.patchtext/x-diff; charset=us-ascii; name=revert-a3179ab69-doesnt-work.patchDownload
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 441f12f6c50..54e0b5cbdd1 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1453,8 +1453,7 @@ generate_base_implied_equalities_no_const(PlannerInfo *root,
* For the moment we force all the Vars to be available at all join nodes
* for this eclass. Perhaps this could be improved by doing some
* pre-analysis of which members we prefer to join, but it's no worse than
- * what happened in the pre-8.3 code. (Note: rebuild_eclass_attr_needed
- * needs to match this code.)
+ * what happened in the pre-8.3 code.
*/
foreach(lc, ec->ec_members)
{
@@ -2561,51 +2560,6 @@ reconsider_full_join_clause(PlannerInfo *root, OuterJoinClauseInfo *ojcinfo)
return false; /* failed to make any deduction */
}
-/*
- * rebuild_eclass_attr_needed
- * Put back attr_needed bits for Vars/PHVs needed for join eclasses.
- *
- * This is used to rebuild attr_needed/ph_needed sets after removal of a
- * useless outer join. It should match what
- * generate_base_implied_equalities_no_const did, except that we call
- * add_vars_to_attr_needed not add_vars_to_targetlist.
- */
-void
-rebuild_eclass_attr_needed(PlannerInfo *root)
-{
- ListCell *lc;
-
- foreach(lc, root->eq_classes)
- {
- EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc);
-
- /*
- * We don't expect any EC child members to exist at this point. Ensure
- * that's the case, otherwise, we might be getting asked to do
- * something this function hasn't been coded for.
- */
- Assert(ec->ec_childmembers == NULL);
-
- /* Need do anything only for a multi-member, no-const EC. */
- if (list_length(ec->ec_members) > 1 && !ec->ec_has_const)
- {
- ListCell *lc2;
-
- foreach(lc2, ec->ec_members)
- {
- EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
- List *vars = pull_var_clause((Node *) cur_em->em_expr,
- PVC_RECURSE_AGGREGATES |
- PVC_RECURSE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
-
- add_vars_to_attr_needed(root, vars, ec->ec_relids);
- list_free(vars);
- }
- }
- }
-}
-
/*
* find_join_domain
* Find the highest JoinDomain enclosed within the given relid set.
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 4455a046d23..11ce7e33573 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -28,7 +28,6 @@
#include "optimizer/optimizer.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
-#include "optimizer/placeholder.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
#include "rewrite/rewriteManip.h"
@@ -344,6 +343,37 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
sjinfo->ojrelid);
}
+ /*
+ * Remove references to the rel from other baserels' attr_needed arrays.
+ */
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[rti];
+ int attroff;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == rti); /* sanity check on array */
+
+ /* no point in processing target rel itself */
+ if (otherrel == rel)
+ continue;
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ otherrel->attr_needed[attroff] =
+ bms_del_member(otherrel->attr_needed[attroff], relid);
+ if (sjinfo != NULL)
+ otherrel->attr_needed[attroff] =
+ bms_del_member(otherrel->attr_needed[attroff],
+ sjinfo->ojrelid);
+ }
+ }
+
/*
* Likewise remove references from SpecialJoinInfo data structures.
*
@@ -451,11 +481,13 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at,
sjinfo->ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- /* Reduce ph_needed to contain only "relation 0"; see below */
- if (bms_is_member(0, phinfo->ph_needed))
- phinfo->ph_needed = bms_make_singleton(0);
- else
- phinfo->ph_needed = NULL;
+
+ phinfo->ph_needed = adjust_relid_set(phinfo->ph_needed,
+ relid, subst);
+ if (sjinfo != NULL)
+ phinfo->ph_needed = adjust_relid_set(phinfo->ph_needed,
+ sjinfo->ojrelid, subst);
+ /* ph_needed might or might not become empty */
phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
@@ -490,46 +522,6 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
(sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
remove_rel_from_eclass(ec, sjinfo, relid, subst);
}
-
- /*
- * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
- * ph_needed relid sets. These have to be known accurately, else we may
- * fail to remove other now-removable outer joins. And our removal of the
- * join clause(s) for this outer join may mean that Vars that were
- * formerly needed no longer are. So we have to do this honestly by
- * repeating the construction of those relid sets. We can cheat to one
- * small extent: we can avoid re-examining the targetlist and HAVING qual
- * by preserving "relation 0" bits from the existing relid sets. This is
- * safe because we'd never remove such references.
- *
- * So, start by removing all other bits from attr_needed sets and
- * lateral_vars lists. (We already did this above for ph_needed.)
- */
- for (rti = 1; rti < root->simple_rel_array_size; rti++)
- {
- RelOptInfo *otherrel = root->simple_rel_array[rti];
- int attroff;
-
- /* there may be empty slots corresponding to non-baserel RTEs */
- if (otherrel == NULL)
- continue;
-
- Assert(otherrel->relid == rti); /* sanity check on array */
-
- for (attroff = otherrel->max_attr - otherrel->min_attr;
- attroff >= 0;
- attroff--)
- {
- if (bms_is_member(0, otherrel->attr_needed[attroff]))
- otherrel->attr_needed[attroff] = bms_make_singleton(0);
- else
- otherrel->attr_needed[attroff] = NULL;
- }
-
- if (subst > 0)
- ChangeVarNodesExtended((Node *) otherrel->lateral_vars, relid,
- subst, 0, replace_relid_callback);
- }
}
/*
@@ -626,7 +618,7 @@ remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
*/
/*
- * Now remove the rel from the baserel array to prevent it from being
+ * Finally, remove the rel from the baserel array to prevent it from being
* referenced again. (We can't do this earlier because
* remove_join_clause_from_rels will touch it.)
*/
@@ -635,15 +627,6 @@ remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
/* And nuke the RelOptInfo, just in case there's another access path */
pfree(rel);
-
- /*
- * Now repeat construction of attr_needed bits coming from all other
- * sources.
- */
- rebuild_placeholder_attr_needed(root);
- rebuild_joinclause_attr_needed(root);
- rebuild_eclass_attr_needed(root);
- rebuild_lateral_attr_needed(root);
}
/*
@@ -1984,16 +1967,6 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
/* And nuke the RelOptInfo, just in case there's another access path. */
pfree(toRemove);
-
-
- /*
- * Now repeat construction of attr_needed bits coming from all other
- * sources.
- */
- rebuild_placeholder_attr_needed(root);
- rebuild_joinclause_attr_needed(root);
- rebuild_eclass_attr_needed(root);
- rebuild_lateral_attr_needed(root);
}
/*
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 01804b085b3..0ed6feb8cdd 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -275,8 +275,6 @@ build_base_rel_tlists(PlannerInfo *root, List *final_tlist)
* have a single owning relation; we keep their attr_needed info in
* root->placeholder_list instead. Find or create the associated
* PlaceHolderInfo entry, and update its ph_needed.
- *
- * See also add_vars_to_attr_needed.
*/
void
add_vars_to_targetlist(PlannerInfo *root, List *vars,
@@ -330,62 +328,6 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
-/*
- * add_vars_to_attr_needed
- * This does a subset of what add_vars_to_targetlist does: it just
- * updates attr_needed for Vars and ph_needed for PlaceHolderVars.
- * We assume the Vars are already in their relations' targetlists.
- *
- * This is used to rebuild attr_needed/ph_needed sets after removal
- * of a useless outer join. The removed join clause might have been
- * the only upper-level use of some other relation's Var, in which
- * case we can reduce that Var's attr_needed and thereby possibly
- * open the door to further join removals. But we can't tell that
- * without tedious reconstruction of the attr_needed data.
- *
- * Note that if a Var's attr_needed is successfully reduced to empty,
- * it will still be in the relation's targetlist even though we do
- * not really need the scan plan node to emit it. The extra plan
- * inefficiency seems tiny enough to not be worth spending planner
- * cycles to get rid of it.
- */
-void
-add_vars_to_attr_needed(PlannerInfo *root, List *vars,
- Relids where_needed)
-{
- ListCell *temp;
-
- Assert(!bms_is_empty(where_needed));
-
- foreach(temp, vars)
- {
- Node *node = (Node *) lfirst(temp);
-
- if (IsA(node, Var))
- {
- Var *var = (Var *) node;
- RelOptInfo *rel = find_base_rel(root, var->varno);
- int attno = var->varattno;
-
- if (bms_is_subset(where_needed, rel->relids))
- continue;
- Assert(attno >= rel->min_attr && attno <= rel->max_attr);
- attno -= rel->min_attr;
- rel->attr_needed[attno] = bms_add_members(rel->attr_needed[attno],
- where_needed);
- }
- else if (IsA(node, PlaceHolderVar))
- {
- PlaceHolderVar *phv = (PlaceHolderVar *) node;
- PlaceHolderInfo *phinfo = find_placeholder_info(root, phv);
-
- phinfo->ph_needed = bms_add_members(phinfo->ph_needed,
- where_needed);
- }
- else
- elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node));
- }
-}
/*****************************************************************************
*
@@ -788,54 +730,10 @@ extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex)
*/
add_vars_to_targetlist(root, newvars, where_needed);
- /*
- * Remember the lateral references for rebuild_lateral_attr_needed and
- * create_lateral_join_info.
- */
+ /* Remember the lateral references for create_lateral_join_info */
brel->lateral_vars = newvars;
}
-/*
- * rebuild_lateral_attr_needed
- * Put back attr_needed bits for Vars/PHVs needed for lateral references.
- *
- * This is used to rebuild attr_needed/ph_needed sets after removal of a
- * useless outer join. It should match what find_lateral_references did,
- * except that we call add_vars_to_attr_needed not add_vars_to_targetlist.
- */
-void
-rebuild_lateral_attr_needed(PlannerInfo *root)
-{
- Index rti;
-
- /* We need do nothing if the query contains no LATERAL RTEs */
- if (!root->hasLateralRTEs)
- return;
-
- /* Examine the same baserels that find_lateral_references did */
- for (rti = 1; rti < root->simple_rel_array_size; rti++)
- {
- RelOptInfo *brel = root->simple_rel_array[rti];
- Relids where_needed;
-
- if (brel == NULL)
- continue;
- if (brel->reloptkind != RELOPT_BASEREL)
- continue;
-
- /*
- * We don't need to repeat all of extract_lateral_references, since it
- * kindly saved the extracted Vars/PHVs in lateral_vars.
- */
- if (brel->lateral_vars == NIL)
- continue;
-
- where_needed = bms_make_singleton(rti);
-
- add_vars_to_attr_needed(root, brel->lateral_vars, where_needed);
- }
-}
-
/*
* create_lateral_join_info
* Fill in the per-base-relation direct_lateral_relids, lateral_relids
@@ -2789,9 +2687,6 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause,
* var propagation is ensured by making ojscope include input rels from
* both sides of the join.
*
- * See also rebuild_joinclause_attr_needed, which has to partially repeat
- * this work after removal of an outer join.
- *
* Note: if the clause gets absorbed into an EquivalenceClass then this
* may be unnecessary, but for now we have to do it to cover the case
* where the EC becomes ec_broken and we end up reinserting the original
@@ -3401,11 +3296,6 @@ process_implied_equality(PlannerInfo *root,
* some of the Vars could have missed having that done because they only
* appeared in single-relation clauses originally. So do it here for
* safety.
- *
- * See also rebuild_joinclause_attr_needed, which has to partially repeat
- * this work after removal of an outer join. (Since we will put this
- * clause into the joininfo lists, that function needn't do any extra work
- * to find it.)
*/
if (bms_membership(relids) == BMS_MULTIPLE)
{
@@ -3547,72 +3437,6 @@ get_join_domain_min_rels(PlannerInfo *root, Relids domain_relids)
}
-/*
- * rebuild_joinclause_attr_needed
- * Put back attr_needed bits for Vars/PHVs needed for join clauses.
- *
- * This is used to rebuild attr_needed/ph_needed sets after removal of a
- * useless outer join. It should match what distribute_qual_to_rels did,
- * except that we call add_vars_to_attr_needed not add_vars_to_targetlist.
- */
-void
-rebuild_joinclause_attr_needed(PlannerInfo *root)
-{
- /*
- * We must examine all join clauses, but there's no value in processing
- * any join clause more than once. So it's slightly annoying that we have
- * to find them via the per-base-relation joininfo lists. Avoid duplicate
- * processing by tracking the rinfo_serial numbers of join clauses we've
- * already seen. (This doesn't work for is_clone clauses, so we must
- * waste effort on them.)
- */
- Bitmapset *seen_serials = NULL;
- Index rti;
-
- /* Scan all baserels for join clauses */
- for (rti = 1; rti < root->simple_rel_array_size; rti++)
- {
- RelOptInfo *brel = root->simple_rel_array[rti];
- ListCell *lc;
-
- if (brel == NULL)
- continue;
- if (brel->reloptkind != RELOPT_BASEREL)
- continue;
-
- foreach(lc, brel->joininfo)
- {
- RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
- Relids relids = rinfo->required_relids;
-
- if (!rinfo->is_clone) /* else serial number is not unique */
- {
- if (bms_is_member(rinfo->rinfo_serial, seen_serials))
- continue; /* saw it already */
- seen_serials = bms_add_member(seen_serials,
- rinfo->rinfo_serial);
- }
-
- if (bms_membership(relids) == BMS_MULTIPLE)
- {
- List *vars = pull_var_clause((Node *) rinfo->clause,
- PVC_RECURSE_AGGREGATES |
- PVC_RECURSE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
- Relids where_needed;
-
- if (rinfo->is_clone)
- where_needed = bms_intersect(relids, root->all_baserels);
- else
- where_needed = relids;
- add_vars_to_attr_needed(root, vars, where_needed);
- list_free(vars);
- }
- }
- }
-}
-
-
/*
* match_foreign_keys_to_quals
* Match foreign-key constraints to equivalence classes and join quals
diff --git a/src/backend/optimizer/util/placeholder.c b/src/backend/optimizer/util/placeholder.c
index e1cd00a72fb..d1ba8923b26 100644
--- a/src/backend/optimizer/util/placeholder.c
+++ b/src/backend/optimizer/util/placeholder.c
@@ -314,33 +314,6 @@ fix_placeholder_input_needed_levels(PlannerInfo *root)
}
}
-/*
- * rebuild_placeholder_attr_needed
- * Put back attr_needed bits for Vars/PHVs needed in PlaceHolderVars.
- *
- * This is used to rebuild attr_needed/ph_needed sets after removal of a
- * useless outer join. It should match what
- * fix_placeholder_input_needed_levels did, except that we call
- * add_vars_to_attr_needed not add_vars_to_targetlist.
- */
-void
-rebuild_placeholder_attr_needed(PlannerInfo *root)
-{
- ListCell *lc;
-
- foreach(lc, root->placeholder_list)
- {
- PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
- List *vars = pull_var_clause((Node *) phinfo->ph_var->phexpr,
- PVC_RECURSE_AGGREGATES |
- PVC_RECURSE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
-
- add_vars_to_attr_needed(root, vars, phinfo->ph_eval_at);
- list_free(vars);
- }
-}
-
/*
* add_placeholders_to_base_rels
* Add any required PlaceHolderVars to base rels' targetlists.
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8410531f2d6..e0924e622d4 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -129,7 +129,6 @@ extern bool process_equivalence(PlannerInfo *root,
extern Expr *canonicalize_ec_expression(Expr *expr,
Oid req_type, Oid req_collation);
extern void reconsider_outer_join_clauses(PlannerInfo *root);
-extern void rebuild_eclass_attr_needed(PlannerInfo *root);
extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
Expr *expr,
List *opfamilies,
diff --git a/src/include/optimizer/placeholder.h b/src/include/optimizer/placeholder.h
index db92d8861ba..a729e73d561 100644
--- a/src/include/optimizer/placeholder.h
+++ b/src/include/optimizer/placeholder.h
@@ -23,7 +23,6 @@ extern PlaceHolderInfo *find_placeholder_info(PlannerInfo *root,
PlaceHolderVar *phv);
extern void find_placeholders_in_jointree(PlannerInfo *root);
extern void fix_placeholder_input_needed_levels(PlannerInfo *root);
-extern void rebuild_placeholder_attr_needed(PlannerInfo *root);
extern void add_placeholders_to_base_rels(PlannerInfo *root);
extern void add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *outer_rel, RelOptInfo *inner_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..e3bf9db6b6e 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -73,11 +73,8 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
-extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
- Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
-extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
extern bool restriction_is_always_true(PlannerInfo *root,
@@ -101,7 +98,6 @@ extern RestrictInfo *build_implied_join_equality(PlannerInfo *root,
Expr *item2,
Relids qualscope,
Index security_level);
-extern void rebuild_joinclause_attr_needed(PlannerInfo *root);
extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 390aabfb34b..336def509bc 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5853,13 +5853,11 @@ CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
CREATE TEMP TABLE c (id int PRIMARY KEY);
CREATE TEMP TABLE d (a int, b int);
-CREATE TEMP TABLE e (id1 int, id2 int, PRIMARY KEY(id1, id2));
INSERT INTO a VALUES (0, 0), (1, NULL);
INSERT INTO b VALUES (0, 0), (1, NULL);
INSERT INTO c VALUES (0), (1);
INSERT INTO d VALUES (1,3), (2,2), (3,1);
-INSERT INTO e VALUES (0,0), (2,2), (3,1);
--- all these cases should be optimizable into a simple seqscan
+-- all three cases should be optimizable into a simple seqscan
explain (costs off) SELECT a.* FROM a LEFT JOIN b ON a.b_id = b.id;
QUERY PLAN
---------------
@@ -5880,14 +5878,6 @@ explain (costs off)
Seq Scan on a
(1 row)
-explain (costs off)
- SELECT a.* FROM a LEFT JOIN b ON a.id = b.id
- LEFT JOIN e ON e.id1 = a.b_id AND b.c_id = e.id2;
- QUERY PLAN
----------------
- Seq Scan on a
-(1 row)
-
-- check optimization of outer join within another special join
explain (costs off)
select id from a where id in (
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index f6e7070db65..2850c20eb37 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2086,22 +2086,17 @@ CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
CREATE TEMP TABLE c (id int PRIMARY KEY);
CREATE TEMP TABLE d (a int, b int);
-CREATE TEMP TABLE e (id1 int, id2 int, PRIMARY KEY(id1, id2));
INSERT INTO a VALUES (0, 0), (1, NULL);
INSERT INTO b VALUES (0, 0), (1, NULL);
INSERT INTO c VALUES (0), (1);
INSERT INTO d VALUES (1,3), (2,2), (3,1);
-INSERT INTO e VALUES (0,0), (2,2), (3,1);
--- all these cases should be optimizable into a simple seqscan
+-- all three cases should be optimizable into a simple seqscan
explain (costs off) SELECT a.* FROM a LEFT JOIN b ON a.b_id = b.id;
explain (costs off) SELECT b.* FROM b LEFT JOIN c ON b.c_id = c.id;
explain (costs off)
SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id)
ON (a.b_id = b.id);
-explain (costs off)
- SELECT a.* FROM a LEFT JOIN b ON a.id = b.id
- LEFT JOIN e ON e.id1 = a.b_id AND b.c_id = e.id2;
-- check optimization of outer join within another special join
explain (costs off)
I wrote:
This leaves us with some pretty unappetizing choices about what
to do in v18:
1. Try to emulate the proposed HEAD fix.
Actually, a preliminary look suggests that we can approximate
that quite well, because v18's create_unique_path() is already
responsible for preparing the list of columns to be de-duplicated
by a UniquePath. So we can stick the same logic about whether
make_pathkeys_for_sortclauses believes that each column adds something
into create_unique_path(). Unlike the case with HEAD, that doesn't
overlap with any useful work the function was already doing, but
I still think that the cost will be negligible in normal cases.
I'll try to have a patch along those lines by tomorrow.
regards, tom lane
On Thu, Aug 28, 2025 at 6:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
This leaves us with some pretty unappetizing choices about what
to do in v18:1. Try to emulate the proposed HEAD fix.
2. Try to fix up the SJE patch so that it calculates relid changes
honestly, or at least no less honestly than what happened before
a3179ab69.3. Revert SJE as well as a3179ab69 in v18.
I will have a look at #1. If there's somebody who wants to look
at #2, feel free, but it won't be me because I don't understand
the SJE patch well enough. Either way, it's not great to be
doing stuff like this just days before rc1 ...
I took a look at #2. I don't have a good understanding of the SJE
patch either, so I might be missing something.
The core dump you mentioned can be reproduced with this query.
create table sj (a int unique, b int);
explain (verbose, costs off)
select t3.a from sj t1
join sj t2 on t1.a = t2.a
join lateral (select t1.a offset 0) t3 on true;
The reason is that remove_rel_from_query() does not update baserels'
lateral_vars lists and thus there are still references to the removed
relid there.
After fixing that issue, another error occurs during the regression
tests.
explain (costs off) select 1 from
(sk k1 join sk k2 on k1.a = k2.a)
join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
ERROR: variable not found in subplan target lists
The reason is that remove_rel_from_query() removes the old relid from
the attr_needed arrays but fails to add the substitute relid to them.
Hence, attached is the fix for SJE after reverting a3179ab69. With
it, all regression tests pass.
Thanks
Richard
Attachments:
fix-SJE-after-revert-a3179ab69.patchapplication/octet-stream; name=fix-SJE-after-revert-a3179ab69.patchDownload
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 11ce7e33573..cd8a1ee26a9 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -344,7 +344,8 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
}
/*
- * Remove references to the rel from other baserels' attr_needed arrays.
+ * Remove references to the rel from other baserels' attr_needed arrays and
+ * lateral_vars lists.
*/
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
@@ -366,12 +367,16 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ adjust_relid_set(otherrel->attr_needed[attroff], relid, subst);
if (sjinfo != NULL)
otherrel->attr_needed[attroff] =
bms_del_member(otherrel->attr_needed[attroff],
sjinfo->ojrelid);
}
+
+ if (subst > 0)
+ ChangeVarNodesExtended((Node *) otherrel->lateral_vars, relid,
+ subst, 0, replace_relid_callback);
}
/*
Richard Guo <guofenglinux@gmail.com> writes:
On Thu, Aug 28, 2025 at 6:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
2. Try to fix up the SJE patch so that it calculates relid changes
honestly, or at least no less honestly than what happened before
a3179ab69.
I took a look at #2. I don't have a good understanding of the SJE
patch either, so I might be missing something.
Thanks for looking at that!
Hence, attached is the fix for SJE after reverting a3179ab69. With
it, all regression tests pass.
I think that these are bugs/oversights in SJE that we should probably
fix in HEAD and v18, even if we're not going to revert a3179ab69.
We don't have strong reason to believe that they're not reachable
some other way.
Attached are a finished version of my v3 patch for HEAD, and the
promised adaptation of it for v18. The only surprise I ran into
was that I had to adopt the same sort of copy-from-the-parent
logic as you have in HEAD in create_unique_paths, because trying
to derive a child's list independently runs into assertion failures
inside make_pathkeys_for_sortclauses. In retrospect, probably
I shouldn't have been surprised.
With one eye on the fast-approaching release freeze for 18rc1,
I plan to go ahead and push these. Please do review if you
have time, but I think getting some buildfarm cycles on these
is time-critical now. (For the same reason, I suggest pushing
those SJE corrections sooner not later.)
regards, tom lane
Attachments:
v5-fix-variable-not-found-in-subplan-target-lists-master.patchtext/x-diff; charset=us-ascii; name*0=v5-fix-variable-not-found-in-subplan-target-lists-master.pa; name*1=tchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 65f17101591..a8c8edfac75 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -8284,8 +8284,8 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
* the needs of the semijoin represented by sjinfo. If it is not possible
* to identify how to make the data unique, NULL is returned.
*
- * If used at all, this is likely to be called repeatedly on the same rel;
- * So we cache the result.
+ * If used at all, this is likely to be called repeatedly on the same rel,
+ * so we cache the result.
*/
RelOptInfo *
create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
@@ -8375,9 +8375,14 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
* variables of the input rel's targetlist. We have to add any such
* expressions to the unique rel's targetlist.
*
- * While in the loop, build the lists of SortGroupClause's that
- * represent the ordering for the sort-based implementation and the
- * grouping for the hash-based implementation.
+ * To complicate matters, some of the values to be unique-ified may be
+ * known redundant by the EquivalenceClass machinery (e.g., because
+ * they have been equated to constants). There is no need to compare
+ * such values during unique-ification, and indeed we had better not
+ * try because the Vars involved may not have propagated as high as
+ * the semijoin's level. We use make_pathkeys_for_sortclauses to
+ * detect such cases, which is a tad inefficient but it doesn't seem
+ * worth building specialized infrastructure for this.
*/
newtlist = make_tlist_from_pathtarget(rel->reltarget);
nextresno = list_length(newtlist) + 1;
@@ -8386,8 +8391,9 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
{
Expr *uniqexpr = lfirst(lc1);
Oid in_oper = lfirst_oid(lc2);
- Oid sortop = InvalidOid;
+ Oid sortop;
TargetEntry *tle;
+ bool made_tle = false;
tle = tlist_member(uniqexpr, newtlist);
if (!tle)
@@ -8398,19 +8404,21 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
false);
newtlist = lappend(newtlist, tle);
nextresno++;
+ made_tle = true;
}
- if (sjinfo->semi_can_btree)
+ /*
+ * Try to build an ORDER BY list to sort the input compatibly. We
+ * do this for each sortable clause even when the clauses are not
+ * all sortable, so that we can detect clauses that are redundant
+ * according to the pathkey machinery.
+ */
+ sortop = get_ordering_op_for_equality_op(in_oper, false);
+ if (OidIsValid(sortop))
{
- /* Create an ORDER BY list to sort the input compatibly */
Oid eqop;
SortGroupClause *sortcl;
- sortop = get_ordering_op_for_equality_op(in_oper, false);
- if (!OidIsValid(sortop)) /* shouldn't happen */
- elog(ERROR, "could not find ordering operator for equality operator %u",
- in_oper);
-
/*
* The Unique node will need equality operators. Normally
* these are the same as the IN clause operators, but if those
@@ -8430,7 +8438,32 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
sortcl->nulls_first = false;
sortcl->hashable = false; /* no need to make this accurate */
sortList = lappend(sortList, sortcl);
+
+ /*
+ * At each step, convert the SortGroupClause list to pathkey
+ * form. If the just-added SortGroupClause is redundant, the
+ * result will be shorter than the SortGroupClause list.
+ */
+ sortPathkeys = make_pathkeys_for_sortclauses(root, sortList,
+ newtlist);
+ if (list_length(sortPathkeys) != list_length(sortList))
+ {
+ /* Drop the redundant SortGroupClause */
+ sortList = list_delete_last(sortList);
+ Assert(list_length(sortPathkeys) == list_length(sortList));
+ /* Undo tlist addition, if we made one */
+ if (made_tle)
+ {
+ newtlist = list_delete_last(newtlist);
+ nextresno--;
+ }
+ /* We need not consider this clause for hashing, either */
+ continue;
+ }
}
+ else if (sjinfo->semi_can_btree) /* shouldn't happen */
+ elog(ERROR, "could not find ordering operator for equality operator %u",
+ in_oper);
if (sjinfo->semi_can_hash)
{
@@ -8460,8 +8493,27 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
}
}
+ /*
+ * Done building the sortPathkeys and groupClause. But the
+ * sortPathkeys are bogus if not all the clauses were sortable.
+ */
+ if (!sjinfo->semi_can_btree)
+ sortPathkeys = NIL;
+
+ /*
+ * It can happen that all the RHS columns are equated to constants.
+ * We'd have to do something special to unique-ify in that case, and
+ * it's such an unlikely-in-the-real-world case that it's not worth
+ * the effort. So just punt if we found no columns to unique-ify.
+ */
+ if (sortPathkeys == NIL && groupClause == NIL)
+ {
+ MemoryContextSwitchTo(oldcontext);
+ return NULL;
+ }
+
+ /* Convert the required targetlist back to PathTarget form */
unique_rel->reltarget = create_pathtarget(root, newtlist);
- sortPathkeys = make_pathkeys_for_sortclauses(root, sortList, newtlist);
}
/* build unique paths based on input rel's pathlist */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 7319945ffe3..c35288eecde 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3398,26 +3398,6 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
ba | 0 | 1
(2 rows)
--- Make sure that generation of HashAggregate for uniqification purposes
--- does not lead to array overflow due to unexpected duplicate hash keys
--- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
-set enable_memoize to off;
-explain (costs off)
- select 1 from tenk1
- where (hundred, thousand) in (select twothousand, twothousand from onek);
- QUERY PLAN
--------------------------------------------------------------
- Hash Join
- Hash Cond: (tenk1.hundred = onek.twothousand)
- -> Seq Scan on tenk1
- Filter: (hundred = thousand)
- -> Hash
- -> HashAggregate
- Group Key: onek.twothousand, onek.twothousand
- -> Seq Scan on onek
-(8 rows)
-
-reset enable_memoize;
--
-- Hash Aggregation Spill tests
--
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 98b05c94a11..b26b8c5bdbe 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3222,6 +3222,24 @@ where b.unique2 is null;
-> Index Only Scan using tenk1_unique2 on tenk1 b
(5 rows)
+-- check that we avoid de-duplicating columns redundantly
+set enable_memoize to off;
+explain (costs off)
+select 1 from tenk1
+where (hundred, thousand) in (select twothousand, twothousand from onek);
+ QUERY PLAN
+-------------------------------------------------
+ Hash Join
+ Hash Cond: (tenk1.hundred = onek.twothousand)
+ -> Seq Scan on tenk1
+ Filter: (hundred = thousand)
+ -> Hash
+ -> HashAggregate
+ Group Key: onek.twothousand
+ -> Seq Scan on onek
+(8 rows)
+
+reset enable_memoize;
--
-- regression test for bogus RTE_GROUP entries
--
@@ -6500,6 +6518,68 @@ where t1.a = s.c;
----------
(0 rows)
+rollback;
+-- check handling of semijoins after join removal: we must suppress
+-- unique-ification of known-constant values
+begin;
+create temp table t (a int unique, b int);
+insert into t values (1, 2);
+explain (verbose, costs off)
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a
+ Inner Unique: true
+ -> Nested Loop
+ Output: t1.a, t5.a
+ -> Index Only Scan using t_a_key on pg_temp.t t1
+ Output: t1.a
+ Index Cond: (t1.a = 1)
+ -> HashAggregate
+ Output: t5.a
+ Group Key: t5.a
+ -> Hash Join
+ Output: t5.a
+ Hash Cond: (t6.b = t4.b)
+ -> Seq Scan on pg_temp.t t6
+ Output: t6.a, t6.b
+ -> Hash
+ Output: t4.b, t5.b, t5.a
+ -> Hash Join
+ Output: t4.b, t5.b, t5.a
+ Inner Unique: true
+ Hash Cond: (t5.b = t4.b)
+ -> Seq Scan on pg_temp.t t5
+ Output: t5.a, t5.b
+ -> Hash
+ Output: t4.b, t4.a
+ -> Index Scan using t_a_key on pg_temp.t t4
+ Output: t4.b, t4.a
+ Index Cond: (t4.a = 1)
+ -> Index Only Scan using t_a_key on pg_temp.t t3
+ Output: t3.a
+ Index Cond: (t3.a = t5.a)
+(32 rows)
+
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+ a
+---
+ 1
+(1 row)
+
rollback;
-- test cases where we can remove a join, but not a PHV computed at it
begin;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index dde85d0dfb2..62540b1ffa4 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1510,15 +1510,6 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
from unnest(array['a','b']) u(v)
group by v||'a' order by 1;
--- Make sure that generation of HashAggregate for uniqification purposes
--- does not lead to array overflow due to unexpected duplicate hash keys
--- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
-set enable_memoize to off;
-explain (costs off)
- select 1 from tenk1
- where (hundred, thousand) in (select twothousand, twothousand from onek);
-reset enable_memoize;
-
--
-- Hash Aggregation Spill tests
--
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5f0a475894d..bccd171afb6 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -839,6 +839,13 @@ explain (costs off)
select a.* from tenk1 a left join tenk1 b on a.unique1 = b.unique2
where b.unique2 is null;
+-- check that we avoid de-duplicating columns redundantly
+set enable_memoize to off;
+explain (costs off)
+select 1 from tenk1
+where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_memoize;
+
--
-- regression test for bogus RTE_GROUP entries
--
@@ -2420,6 +2427,32 @@ where t1.a = s.c;
rollback;
+-- check handling of semijoins after join removal: we must suppress
+-- unique-ification of known-constant values
+begin;
+
+create temp table t (a int unique, b int);
+insert into t values (1, 2);
+
+explain (verbose, costs off)
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+
+rollback;
+
-- test cases where we can remove a join, but not a PHV computed at it
begin;
v5-fix-variable-not-found-in-subplan-target-lists-v18.patchtext/x-diff; charset=us-ascii; name=v5-fix-variable-not-found-in-subplan-target-lists-v18.patchDownload
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e0192d4a491..6d6e67f84e9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -19,6 +19,7 @@
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/extensible.h"
+#include "nodes/makefuncs.h"
#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
@@ -27,7 +28,9 @@
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
#include "optimizer/tlist.h"
+#include "parser/parse_clause.h"
#include "parser/parsetree.h"
+#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/selfuncs.h"
@@ -1727,6 +1730,8 @@ UniquePath *
create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
SpecialJoinInfo *sjinfo)
{
+ List *uniq_exprs;
+ List *in_operators;
UniquePath *pathnode;
Path sort_path; /* dummy for result of cost_sort */
Path agg_path; /* dummy for result of cost_agg */
@@ -1760,6 +1765,134 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
*/
oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
+ /*
+ * First, identify the columns/expressions to be made unique along with
+ * the associated equality operators. We made lists of these when the
+ * SpecialJoinInfo was created, but that was before constructing
+ * EquivalenceClasses. Some of the values to be unique-ified may now be
+ * known redundant by the EquivalenceClass machinery (e.g., because they
+ * have been equated to constants). There is no need to compare such
+ * values during unique-ification, and indeed we had better not try
+ * because the Vars involved may not have propagated as high as the
+ * semijoin's level. We use make_pathkeys_for_sortclauses to detect such
+ * cases, which is a tad inefficient but it doesn't seem worth building
+ * specialized infrastructure for this.
+ *
+ * For a child rel, we can construct these lists from those of its parent.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ UniquePath *parent_path = (UniquePath *) rel->top_parent->cheapest_unique_path;
+
+ Assert(parent_path && IsA(parent_path, UniquePath));
+ uniq_exprs = (List *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) parent_path->uniq_exprs,
+ rel,
+ rel->top_parent);
+ in_operators = copyObject(parent_path->in_operators);
+ }
+ else
+ {
+ List *newtlist;
+ List *sortList;
+ ListCell *lc1;
+ ListCell *lc2;
+
+ uniq_exprs = in_operators = newtlist = sortList = NIL;
+ forboth(lc1, sjinfo->semi_rhs_exprs, lc2, sjinfo->semi_operators)
+ {
+ Expr *uniqexpr = lfirst(lc1);
+ Oid in_oper = lfirst_oid(lc2);
+ Oid sortop;
+
+ /*
+ * Try to build an ORDER BY list to sort the input compatibly. We
+ * do this for each sortable clause even when the clauses are not
+ * all sortable, so that we can detect clauses that are redundant
+ * according to the pathkey machinery.
+ */
+ sortop = get_ordering_op_for_equality_op(in_oper, false);
+ if (OidIsValid(sortop))
+ {
+ Oid eqop;
+ TargetEntry *tle;
+ SortGroupClause *sortcl;
+ List *sortPathkeys;
+
+ /*
+ * The Unique node will need equality operators. Normally
+ * these are the same as the IN clause operators, but if those
+ * are cross-type operators then the equality operators are
+ * the ones for the IN clause operators' RHS datatype.
+ */
+ eqop = get_equality_op_for_ordering_op(sortop, NULL);
+ if (!OidIsValid(eqop)) /* shouldn't happen */
+ elog(ERROR, "could not find equality operator for ordering operator %u",
+ sortop);
+
+ tle = makeTargetEntry((Expr *) uniqexpr,
+ list_length(newtlist) + 1,
+ NULL,
+ false);
+ newtlist = lappend(newtlist, tle);
+
+ sortcl = makeNode(SortGroupClause);
+ sortcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ sortcl->eqop = eqop;
+ sortcl->sortop = sortop;
+ sortcl->reverse_sort = false;
+ sortcl->nulls_first = false;
+ sortcl->hashable = false; /* no need to make this accurate */
+ sortList = lappend(sortList, sortcl);
+
+ /*
+ * At each step, convert the SortGroupClause list to pathkey
+ * form. If the just-added SortGroupClause is redundant, the
+ * result will be shorter than the SortGroupClause list.
+ */
+ sortPathkeys = make_pathkeys_for_sortclauses(root, sortList,
+ newtlist);
+ if (list_length(sortPathkeys) != list_length(sortList))
+ {
+ /* Drop the redundant SortGroupClause */
+ sortList = list_delete_last(sortList);
+ Assert(list_length(sortPathkeys) == list_length(sortList));
+ /* Undo tlist addition too */
+ newtlist = list_delete_last(newtlist);
+ /* Don't need this column */
+ continue;
+ }
+ }
+ else if (sjinfo->semi_can_btree) /* shouldn't happen */
+ elog(ERROR, "could not find ordering operator for equality operator %u",
+ in_oper);
+
+ /*
+ * We need to include this column in the output list.
+ *
+ * Under GEQO and when planning child joins, the sjinfo might be
+ * short-lived, so we'd better make copies of data structures we
+ * extract from it.
+ */
+ uniq_exprs = lappend(uniq_exprs, copyObject(uniqexpr));
+ in_operators = lappend_oid(in_operators, in_oper);
+ }
+
+ /*
+ * It can happen that all the RHS columns are equated to constants.
+ * We'd have to do something special to unique-ify in that case, and
+ * it's such an unlikely-in-the-real-world case that it's not worth
+ * the effort. So just punt if we found no columns to unique-ify.
+ */
+ if (uniq_exprs == NIL)
+ {
+ MemoryContextSwitchTo(oldcontext);
+ return NULL;
+ }
+ }
+
+ /* OK, build the path node */
pathnode = makeNode(UniquePath);
pathnode->path.pathtype = T_Unique;
@@ -1778,14 +1911,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pathnode->path.pathkeys = NIL;
pathnode->subpath = subpath;
-
- /*
- * Under GEQO and when planning child joins, the sjinfo might be
- * short-lived, so we'd better make copies of data structures we extract
- * from it.
- */
- pathnode->in_operators = copyObject(sjinfo->semi_operators);
- pathnode->uniq_exprs = copyObject(sjinfo->semi_rhs_exprs);
+ pathnode->in_operators = in_operators;
+ pathnode->uniq_exprs = uniq_exprs;
/*
* If the input is a relation and it has a unique index that proves the
@@ -1795,8 +1922,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
*/
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
- sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ uniq_exprs,
+ in_operators))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
@@ -1829,13 +1956,13 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
{
List *sub_tlist_colnos;
- sub_tlist_colnos = translate_sub_tlist(sjinfo->semi_rhs_exprs,
+ sub_tlist_colnos = translate_sub_tlist(uniq_exprs,
rel->relid);
if (sub_tlist_colnos &&
query_is_distinct_for(rte->subquery,
sub_tlist_colnos,
- sjinfo->semi_operators))
+ in_operators))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
@@ -1855,11 +1982,11 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
/* Estimate number of output rows */
pathnode->path.rows = estimate_num_groups(root,
- sjinfo->semi_rhs_exprs,
+ uniq_exprs,
rel->rows,
NULL,
NULL);
- numCols = list_length(sjinfo->semi_rhs_exprs);
+ numCols = list_length(uniq_exprs);
if (sjinfo->semi_can_btree)
{
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1f1ce2380af..4cfbe424603 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3379,26 +3379,6 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
ba | 0 | 1
(2 rows)
--- Make sure that generation of HashAggregate for uniqification purposes
--- does not lead to array overflow due to unexpected duplicate hash keys
--- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
-set enable_memoize to off;
-explain (costs off)
- select 1 from tenk1
- where (hundred, thousand) in (select twothousand, twothousand from onek);
- QUERY PLAN
--------------------------------------------------------------
- Hash Join
- Hash Cond: (tenk1.hundred = onek.twothousand)
- -> Seq Scan on tenk1
- Filter: (hundred = thousand)
- -> Hash
- -> HashAggregate
- Group Key: onek.twothousand, onek.twothousand
- -> Seq Scan on onek
-(8 rows)
-
-reset enable_memoize;
--
-- Hash Aggregation Spill tests
--
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 390aabfb34b..a30ff88eef8 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3222,6 +3222,24 @@ where b.unique2 is null;
-> Index Only Scan using tenk1_unique2 on tenk1 b
(5 rows)
+-- check that we avoid de-duplicating columns redundantly
+set enable_memoize to off;
+explain (costs off)
+select 1 from tenk1
+where (hundred, thousand) in (select twothousand, twothousand from onek);
+ QUERY PLAN
+-------------------------------------------------
+ Hash Join
+ Hash Cond: (tenk1.hundred = onek.twothousand)
+ -> Seq Scan on tenk1
+ Filter: (hundred = thousand)
+ -> Hash
+ -> HashAggregate
+ Group Key: onek.twothousand
+ -> Seq Scan on onek
+(8 rows)
+
+reset enable_memoize;
--
-- regression test for bogus RTE_GROUP entries
--
@@ -6500,6 +6518,68 @@ where t1.a = s.c;
----------
(0 rows)
+rollback;
+-- check handling of semijoins after join removal: we must suppress
+-- unique-ification of known-constant values
+begin;
+create temp table t (a int unique, b int);
+insert into t values (1, 2);
+explain (verbose, costs off)
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a
+ Inner Unique: true
+ -> Nested Loop
+ Output: t1.a, t5.a
+ -> Index Only Scan using t_a_key on pg_temp.t t1
+ Output: t1.a
+ Index Cond: (t1.a = 1)
+ -> HashAggregate
+ Output: t5.a
+ Group Key: t5.a
+ -> Hash Join
+ Output: t5.a
+ Hash Cond: (t6.b = t4.b)
+ -> Seq Scan on pg_temp.t t6
+ Output: t6.a, t6.b
+ -> Hash
+ Output: t4.b, t5.b, t5.a
+ -> Hash Join
+ Output: t4.b, t5.b, t5.a
+ Inner Unique: true
+ Hash Cond: (t5.b = t4.b)
+ -> Seq Scan on pg_temp.t t5
+ Output: t5.a, t5.b
+ -> Hash
+ Output: t4.b, t4.a
+ -> Index Scan using t_a_key on pg_temp.t t4
+ Output: t4.b, t4.a
+ Index Cond: (t4.a = 1)
+ -> Index Only Scan using t_a_key on pg_temp.t t3
+ Output: t3.a
+ Index Cond: (t3.a = t5.a)
+(32 rows)
+
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+ a
+---
+ 1
+(1 row)
+
rollback;
-- test cases where we can remove a join, but not a PHV computed at it
begin;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 277b4b198cc..79eca85c985 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1505,15 +1505,6 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
from unnest(array['a','b']) u(v)
group by v||'a' order by 1;
--- Make sure that generation of HashAggregate for uniqification purposes
--- does not lead to array overflow due to unexpected duplicate hash keys
--- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
-set enable_memoize to off;
-explain (costs off)
- select 1 from tenk1
- where (hundred, thousand) in (select twothousand, twothousand from onek);
-reset enable_memoize;
-
--
-- Hash Aggregation Spill tests
--
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index f6e7070db65..23e220afcd2 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -839,6 +839,13 @@ explain (costs off)
select a.* from tenk1 a left join tenk1 b on a.unique1 = b.unique2
where b.unique2 is null;
+-- check that we avoid de-duplicating columns redundantly
+set enable_memoize to off;
+explain (costs off)
+select 1 from tenk1
+where (hundred, thousand) in (select twothousand, twothousand from onek);
+reset enable_memoize;
+
--
-- regression test for bogus RTE_GROUP entries
--
@@ -2420,6 +2427,32 @@ where t1.a = s.c;
rollback;
+-- check handling of semijoins after join removal: we must suppress
+-- unique-ification of known-constant values
+begin;
+
+create temp table t (a int unique, b int);
+insert into t values (1, 2);
+
+explain (verbose, costs off)
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+
+select t1.a from t t1
+ left join t t2 on t1.a = t2.a
+ join t t3 on true
+where exists (select 1 from t t4
+ join t t5 on t4.b = t5.b
+ join t t6 on t5.b = t6.b
+ where t1.a = t4.a and t3.a = t5.a and t4.a = 1);
+
+rollback;
+
-- test cases where we can remove a join, but not a PHV computed at it
begin;
On Fri, Aug 29, 2025 at 2:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Attached are a finished version of my v3 patch for HEAD, and the
promised adaptation of it for v18. The only surprise I ran into
was that I had to adopt the same sort of copy-from-the-parent
logic as you have in HEAD in create_unique_paths, because trying
to derive a child's list independently runs into assertion failures
inside make_pathkeys_for_sortclauses. In retrospect, probably
I shouldn't have been surprised.With one eye on the fast-approaching release freeze for 18rc1,
I plan to go ahead and push these. Please do review if you
have time, but I think getting some buildfarm cycles on these
is time-critical now.
I reviewed the v5 patch and found one issue. For a child relation, we
shouldn't assume that the parent's unique rel or unique path always
exists. In cases where all RHS columns are equated to constants, we
currently don't build the unique rel/path for the parent table. This
issue results in SIGSEGV for the query below in both v18 and master.
create table p (a int, b int) partition by range(a);
create table p1 partition of p for values from (0) to (10);
create table p2 partition of p for values from (10) to (20);
set enable_partitionwise_join to on;
explain (costs off)
select * from p t1 where exists
(select 1 from p t2 where t1.a = t2.a and t1.a = 1);
server closed the connection unexpectedly
The fix is very simple:
@@ -8307,6 +8307,15 @@ create_unique_paths(PlannerInfo *root,
RelOptInfo *rel, SpecialJoinInfo *sjinfo)
if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
return NULL;
+ /*
+ * Punt if this is a child relation and we failed to build a unique-ified
+ * relation for its parent. This can happen if all the RHS columns are
+ * found to be equated to constants when unique-ifying the parent table,
+ * leaving no columns to unique-ify.
+ */
+ if (IS_OTHER_REL(rel) && rel->top_parent->unique_rel == NULL)
+ return NULL;
+
I plan to push the fix to both v18 and master, if there are no
objections.
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
On Fri, Aug 29, 2025 at 2:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
With one eye on the fast-approaching release freeze for 18rc1,
I plan to go ahead and push these. Please do review if you
have time, but I think getting some buildfarm cycles on these
is time-critical now.
I reviewed the v5 patch and found one issue.
Ah, thanks for spotting that.
I plan to push the fix to both v18 and master, if there are no
objections.
Please do.
regards, tom lane
On Fri, Aug 29, 2025 at 10:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
I plan to push the fix to both v18 and master, if there are no
objections.
Please do.
Done.
Thanks
Richard