[BUG] Remove self joins causes 'variable not found in subplan target lists' error

Started by Sergey Soloviev7 months ago20 messages

Jump to latest

Sergey Soloviev

sergey.soloviev@tantorlabs.ru

7 months ago

Hi, hackers!

We have encountered a bug in the planner which raises an error 'variable
not found in target lists'.

To reproduce apply patch '0001-Disable-JOIN_SEMI-and-JOIN_RIGHT_SEMI-paths.patch'
and run this query:

```
create table t1(i int, j int);
create table t2(i int, j int);
create table t3(i int, j int);
create table t4(i int, j int);
create table t5(i int, j int);
create unique index on t2(i, j);

select t1.j from t1
    left join t2 on t1.i = t2.i and t2.j = 1
    left join (
        select i from t3
        where exists (
            select true from t4
                left join t5 as t5 on t5.j = t4.j
                left join t5 as t6 on t6.j = t4.j
            where t4.i = t3.i
                and t4.i =2
            )
        ) t on t1.j = t.i;
```

This bug was caused by 'rebuild_eclass_attr_needed' function which
did not process EC with constants. The logic behind it is clear - we
can substitute these attributes with constants and do not use them
later, but it is not true when `EXISTS` converts to JOIN. If planner
decided to use Unique + Sort, then the following happens:

1. `t4.i`, `t3.i` and `2` are contained in same equivalence class, so
'ec_has_const' is 'true'
2. During 'rebuild_eclass_attr_needed' their EC is skipped because it
contains constant (src/backend/optimizer/path/equivclass.c:2590)

```
if (list_length(ec->ec_members) > 1 && !ec->ec_has_const)
```

3. 'attr_needed' of 't4.i' is still NULL, so it is not added to 'pathtarget'
of 'RelOptInfo' (src/backend/optimizer/util/relnode.c:1199)

```
if (!bms_nonempty_difference(baserel->attr_needed[ndx], relids))
continue; /* nope, skip it */
```

4. 'UniquePath' is created, but `t4.i` (which must be unique expression)
is not in 'pathtarget' of RelOptInfo, so added to path's targetlist
manually (src/backend/optimizer/plan/planner.c:8395)

```
tle = tlist_member(uniqexpr, newtlist);
if (!tle)
{
    tle = makeTargetEntry((Expr *) uniqexpr,
                                                nextresno,
                                                NULL,
                                                false);
    newtlist = lappend(newtlist, tle);
}
```

5. When creating a Plan targetlist is just copied from Path
6. At the very end 'set_plan_references' can not find expression at
level below (because it was "not needed" and was not added to
targetlist), so it throws 'variable not found in subplan target lists'

This patch fixes this error by taking considering multi-member EC
even with constants, but skipping constant members during members
iteration.

There are 4 attachments:

- 'schema.sql' - schema for reproducing the bug
- 'query.sql' - actual query to raise error
- '0001-Disable-JOIN_SEMI-and-JOIN_RIGHT_SEMI-paths.patch' - disable
JOIN_SEMI to be able to reproduce the bug
- '0001-fix-variable-not-found-in-subplan-target-lists.patch' - patch
with actual fix of this bug

I would like write a test in 'join.sql', but for now it requires patches
to easily reproduce the bug. I appreciate it if someone could find
an easier way to reproduce the bug and write a simple test.

---
Regards,
Sergey Soloviev
Tantor Labs LLC

Richard Guo

guofenglinux@gmail.com

7 months ago

In reply to: Sergey Soloviev (#1)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

On Sat, Aug 23, 2025 at 12:27 AM Sergey Soloviev
<sergey.soloviev@tantorlabs.ru> wrote:

I would like write a test in 'join.sql', but for now it requires patches
to easily reproduce the bug. I appreciate it if someone could find
an easier way to reproduce the bug and write a simple test.

Nice catch! Here's a query that reproduces the error without needing
to hack the code.

create table t (a int, b int);
create unique index on t (a);

select t1.a from t t1
left join t t2 on t1.a = t2.a
join t t3 on true
where exists (select 1 from t t4
join t t5 on t4.b = t5.b
join t t6 on t5.b = t6.b
where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
ERROR: variable not found in subplan target lists

This bug was introduced by commit:

commit a3179ab692be4314d5ee5cd56598976c487d5ef2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Sep 27 16:04:04 2024 -0400

Recalculate where-needed data accurately after a join removal.

(I'm a bit surprised it took us so long to discover it.)

When distributing qual clause "t1.a = t4.a", distribute_qual_to_rels
adds t1.a and t4.a that are used in this clause to the targetlists of
their relations. (However, if the clause gets absorbed into an EC
that contains const, this can result in adding Vars to relations that
do not actually require them. But that's a different problem.)

However, when rebuilding attr_needed bits for t1.a and t4.a after the
join removal, rebuild_eclass_attr_needed fails to restore the bits
because it skips ECs that contain constant, as explained by Sergey.
Later, it turns out that t4.a is needed at the join leval t4/t5/t6 for
the unique-ification of the RHS of the semi-join.

The proposed patch can fix this error. However, I'm wondering if we
could address it from the unique-ification side instead. If a Var
we're trying to unique-ify is known to be equal to a constant, then we
shouldn't need to unique-ify that Var -- and if it's not needed at
upper levels, it shouldn't need to be in the targetlist of the unique
path. For example, in the query above, t4.a does not need to be added
in the targetlist of the unique path, right?

Thanks
Richard

Andrei Lepikhov

lepihov@gmail.com

7 months ago

In reply to: Richard Guo (#2)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

On 25/8/2025 15:28, Richard Guo wrote:

On Sat, Aug 23, 2025 at 12:27 AM Sergey Soloviev
<sergey.soloviev@tantorlabs.ru> wrote:

I would like write a test in 'join.sql', but for now it requires patches
to easily reproduce the bug. I appreciate it if someone could find
an easier way to reproduce the bug and write a simple test.

Nice catch! Here's a query that reproduces the error without needing
to hack the code.

create table t (a int, b int);
create unique index on t (a);

select t1.a from t t1
left join t t2 on t1.a = t2.a
join t t3 on true
where exists (select 1 from t t4
join t t5 on t4.b = t5.b
join t t6 on t5.b = t6.b
where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
ERROR: variable not found in subplan target lists

Thanks for your reproduction.
Unfortunately, it works only in the absence of an ANALYZE, like the
original example.
Also, I would say it is not a self-join-related issue. This example
employs the removal of the 'unnecessary left join'. Currently, I'm
unsure why this example causes the issue: the removing t2 table
shouldn't have any references in ECs within the EXISTS part.

--
regards, Andrei Lepikhov

Sergey Soloviev

sergey.soloviev@tantorlabs.ru

7 months ago

In reply to: Andrei Lepikhov (#3)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

25.08.2025 16:59, Andrei Lepikhov:

On 25/8/2025 15:28, Richard Guo wrote:

On Sat, Aug 23, 2025 at 12:27 AM Sergey Soloviev
<sergey.soloviev@tantorlabs.ru> wrote:

I would like write a test in 'join.sql', but for now it requires patches
to easily reproduce the bug. I appreciate it if someone could find
an easier way to reproduce the bug and write a simple test.

Nice catch! Here's a query that reproduces the error without needing
to hack the code.

create table t (a int, b int);
create unique index on t (a);

select t1.a from t t1
   left join t t2 on t1.a = t2.a
        join t t3 on true
where exists (select 1 from t t4
                 join t t5 on t4.b = t5.b
                 join t t6 on t5.b = t6.b
               where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
ERROR: variable not found in subplan target lists

Thanks for your reproduction.
Unfortunately, it works only in the absence of an ANALYZE, like the original example.
Also, I would say it is not a self-join-related issue. This example employs the removal of the 'unnecessary left join'. Currently, I'm unsure why this example causes the issue: the removing t2 table shouldn't have any references in ECs within the EXISTS part.

Hi!

Yes, this is not created by SJE, but this bug introduced by commit adding SJE logic:
first remove any 'attr_needed' (and other info) and then restore it according
to only needed relations.

Provided example shows bug in the code.

'attr_needed' is cleared at src/backend/optimizer/plan/analyzejoins.c:526.
If we dump the state for relation t4, then we will get

attr_needed[a] = {1, 6} /* {t1, t4} */

And also, there is EC = {t1.a, t4.a, 2}. This comes from WHERE in EXISTS:

t1.a = t4.a AND t4.a = 2

But during the second phase (recreating 'attr_needed') we see that EC contains
constant (2), so skip recreating 'attr_needed[a]' for t4, but it previously had t1
in 'attr_needed' which was not removed by join elimination logic. Roughly
speaking, we have lost dependency with t1.

Thus, error is caused not by removing t2 itself, but by the manipulations
involved.
---
Regards,
Sergey Solviev
Tantor Labs LLC

Tom Lane

tgl@sss.pgh.pa.us

7 months ago

In reply to: Richard Guo (#2)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

Richard Guo <guofenglinux@gmail.com> writes:

The proposed patch can fix this error. However, I'm wondering if we
could address it from the unique-ification side instead. If a Var
we're trying to unique-ify is known to be equal to a constant, then we
shouldn't need to unique-ify that Var

Yeah. I think this is an oversight in create_unique_paths(): it's
building an ORDER BY list without consideration for the possibility
that some of the entries are known to be constant. In fact, because
make_pathkeys_for_sortclauses will get rid of redundancies, this
example actually ends up with a unique_rel whose unique_pathkeys
are shorter than the unique_groupclause, which is pretty bogus.

Not sure about a good way to make it account for that though.
Detection of redundancies of this sort is kind of buried in
pathkey construction, but in this context we'd like to find out
earlier: we need to avoid attaching a new tlist entry if the
expression is constant, else we'll still get the failure.

regards, tom lane

Sergey Soloviev

sergey.soloviev@tantorlabs.ru

7 months ago

In reply to: Richard Guo (#2)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

25.08.2025 16:28, Richard Guo пишет:

On Sat, Aug 23, 2025 at 12:27 AM Sergey Soloviev
<sergey.soloviev@tantorlabs.ru> wrote:

I would like write a test in 'join.sql', but for now it requires patches
to easily reproduce the bug. I appreciate it if someone could find
an easier way to reproduce the bug and write a simple test.

Nice catch! Here's a query that reproduces the error without needing
to hack the code.

create table t (a int, b int);
create unique index on t (a);

select t1.a from t t1
left join t t2 on t1.a = t2.a
join t t3 on true
where exists (select 1 from t t4
join t t5 on t4.b = t5.b
join t t6 on t5.b = t6.b
where t1.a = t4.a and t3.a = t5.a and t4.a = 2);
ERROR: variable not found in subplan target lists

This bug was introduced by commit:

commit a3179ab692be4314d5ee5cd56598976c487d5ef2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Sep 27 16:04:04 2024 -0400

Recalculate where-needed data accurately after a join removal.

(I'm a bit surprised it took us so long to discover it.)

When distributing qual clause "t1.a = t4.a", distribute_qual_to_rels
adds t1.a and t4.a that are used in this clause to the targetlists of
their relations. (However, if the clause gets absorbed into an EC
that contains const, this can result in adding Vars to relations that
do not actually require them. But that's a different problem.)

However, when rebuilding attr_needed bits for t1.a and t4.a after the
join removal, rebuild_eclass_attr_needed fails to restore the bits
because it skips ECs that contain constant, as explained by Sergey.
Later, it turns out that t4.a is needed at the join leval t4/t5/t6 for
the unique-ification of the RHS of the semi-join.

The proposed patch can fix this error. However, I'm wondering if we
could address it from the unique-ification side instead. If a Var
we're trying to unique-ify is known to be equal to a constant, then we
shouldn't need to unique-ify that Var -- and if it's not needed at
upper levels, it shouldn't need to be in the targetlist of the unique
path. For example, in the query above, t4.a does not need to be added
in the targetlist of the unique path, right?

Thanks
Richard

Hi, Richard!

Thanks for finding this example. I have added it to join.sql regress test.
New patch version is attached.

I also thought about case to just check Var to be a constant. But
the main question is 'if it is the only node affected?'. Example shows
that error is caused by Unique path, but maybe there are another
nodes which will cause such error.

Yes only 'create_unique_paths' creates new TargetEntry. But I think
the root cause is that when recreating 'attr_needed', the dependency
between this relation and some other relation is lost.
---
Regards,
Sergey Soloviev
Tantor Labs LLC

Tom Lane

tgl@sss.pgh.pa.us

7 months ago

In reply to: Tom Lane (#5)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

I wrote:

Yeah. I think this is an oversight in create_unique_paths(): it's
building an ORDER BY list without consideration for the possibility
that some of the entries are known to be constant. In fact, because
make_pathkeys_for_sortclauses will get rid of redundancies, this
example actually ends up with a unique_rel whose unique_pathkeys
are shorter than the unique_groupclause, which is pretty bogus.

Here is a patch that fixes it that way. I like this better than
Sergey's approach because it is making the plans better not worse.

There are a couple loose ends yet to be dealt with:

* The fact that we now avoid duplicate unique-ifications is good,
but I think it breaks the test case added by 44e95b572, since
that's no longer demonstrating what it set out to demonstrate.
(See the delta in aggregates.out.) I'm not sure that that's a
huge problem, but we probably at least ought to modify the comment
on that test case.

* What happens if we find that *all* the semi_rhs_exprs are known
constant? We'll compute empty pathkeys and groupClause, but then
what? It looks like create_final_unique_paths etc then build a
useless Unique step. Maybe we should just absorb the input
relation's path list as-is?

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

7 months ago

In reply to: Tom Lane (#7)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

BTW, a couple of thoughts came to me after considering this
issue awhile longer:

1. The really fundamental problem is that we don't update
SpecialJoinInfo's semi_operators/semi_rhs_exprs after discovering that
join-level comparisons of a particular value are unnecessary. We will
then not emit the actual join clause ("t1.a = t4.a" in this example),
but we still list t4.a as something that would need unique-ification.
I looked a little bit at whether it'd be reasonable to do that at the
end of construction of EquivalenceClasses, but I think it'd add a lot
more planning cycles than my v3 patch. A more invasive idea could be
to not compute semi_operators/semi_rhs_exprs at all until we've
finished building EquivalenceClasses. I'm not sure if that'd be
adequately cheap either, but maybe it would be.

2. Another thing that's a bit worrisome is the recognition that
a3179ab69's logic for reconstruction of attr_needed might produce
different results than we had to begin with. It's not necessarily
worse: in the particular case we're considering here, the results
are arguably better. But it's scary to think that if there are
any other bugs in that code, we will only find them in queries that
use join removal. That's not a recipe for thorough test coverage.
I wonder if it could be sane to delete the existing logic that builds
attr_needed during initial jointree deconstruction, and instead
always fill attr_needed by running the new "reconstruction" logic.
The trouble with that is we'd have to do it just before starting
join removal, and then again after each successful join removal,
since join removal itself depends on the attr_needed results.
It seems likely that that would net out slower than the current
code. But maybe the difference would be small.

I'm not planning to work on either of these ideas right now,
but I thought I'd put them out there in case someone else is
interested.

regards, tom lane

Richard Guo

guofenglinux@gmail.com

7 months ago

In reply to: Tom Lane (#7)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

On Tue, Aug 26, 2025 at 4:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here is a patch that fixes it that way. I like this better than
Sergey's approach because it is making the plans better not worse.

There are a couple loose ends yet to be dealt with:

* The fact that we now avoid duplicate unique-ifications is good,
but I think it breaks the test case added by 44e95b572, since
that's no longer demonstrating what it set out to demonstrate.
(See the delta in aggregates.out.) I'm not sure that that's a
huge problem, but we probably at least ought to modify the comment
on that test case.

* What happens if we find that *all* the semi_rhs_exprs are known
constant? We'll compute empty pathkeys and groupClause, but then
what? It looks like create_final_unique_paths etc then build a
useless Unique step. Maybe we should just absorb the input
relation's path list as-is?

Yeah, I think this is a better approach.

Instead of repeatedly calling make_pathkeys_for_sortclauses to detect
redundant expressions, I'm wondering if we could introduce a function
that detects whether a given expression is known equal to a constant
by the EquivalenceClass machinery. This function should not be too
costly, as we'd only need to examine ECs that contain pseudoconstants.

We could then use this function to remove expressions that are known
constant from semi_rhs_exprs. And if we find that all expressions
in semi_rhs_exprs are known constant (the second loose end you
mentioned), we can give up building unique paths and fall back to a
traditional JOIN_SEMI.

To be concrete, I'm imagining something like the attached patch.
Currently, it doesn't update the modified semi_rhs_exprs and
semi_operators back to SpecialJoinInfo, but that shouldn't be hard to
add.

One limitation is that the patch doesn't try to detect grouping
expressions that are known equal to each other, so you may still see
redundant grouping expressions, such as in the test case added by
44e95b572. I'm not sure if that's a big issue.

Thanks
Richard

#10

Richard Guo

guofenglinux@gmail.com

7 months ago

In reply to: Tom Lane (#7)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

On Tue, Aug 26, 2025 at 4:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Yeah. I think this is an oversight in create_unique_paths(): it's
building an ORDER BY list without consideration for the possibility
that some of the entries are known to be constant. In fact, because
make_pathkeys_for_sortclauses will get rid of redundancies, this
example actually ends up with a unique_rel whose unique_pathkeys
are shorter than the unique_groupclause, which is pretty bogus.

Here is a patch that fixes it that way. I like this better than
Sergey's approach because it is making the plans better not worse.

Another question we need to consider is how to fix this error in v18,
where it seems we'll need to remove redundant expressions during
createplan.c.

Thanks
Richard

#11

Tom Lane

tgl@sss.pgh.pa.us

7 months ago

In reply to: Richard Guo (#9)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

Richard Guo <guofenglinux@gmail.com> writes:

Instead of repeatedly calling make_pathkeys_for_sortclauses to detect
redundant expressions, I'm wondering if we could introduce a function
that detects whether a given expression is known equal to a constant
by the EquivalenceClass machinery. This function should not be too
costly, as we'd only need to examine ECs that contain pseudoconstants.

I did consider that, but rejected it. I kind of like the fact that
the v3 patch gets rid of duplicated columns-to-be-made-unique.
Yes, repeatedly doing make_pathkeys_for_sortclauses is a bit grotty,
because it's O(N^2) in the number of columns-to-be-made-unique ...
but realistically, how often is that ever more than a couple?
In the common case where the semijoin has only one join column,
the patch adds basically zero work, which is nice too.

We could then use this function to remove expressions that are known
constant from semi_rhs_exprs. And if we find that all expressions
in semi_rhs_exprs are known constant (the second loose end you
mentioned), we can give up building unique paths and fall back to a
traditional JOIN_SEMI.

Yeah, I was thinking we could just use the paths of the existing
rel, but really this case means that we'd need to de-dup down
to a single row. We could maybe do something involving plastering
LIMIT 1 on top of each input path, but it's such a weird and
probably-useless-in-the-real-world case that I don't think it's
worth expending effort on. Failing outright seems fine.

Another question we need to consider is how to fix this error in v18,
where it seems we'll need to remove redundant expressions during
createplan.c.

Ugh ... I forgot you just refactored all that code.

I wonder if the right answer in v18 is to back out a3179ab69.
Not fixing that performance regression for another year would
be mildly annoying; but AFAIR we've still had only the one
complaint, so it seems like it's not hurting many people.
And we are getting mighty late in the release calendar to be
inventing new stuff in v18.

regards, tom lane

#12

Maksim Milyutin

maksim.milyutin@tantorlabs.ru

7 months ago

In reply to: Tom Lane (#11)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

On 8/26/25 20:26, Tom Lane wrote:

Richard Guo <guofenglinux@gmail.com> writes:

We could then use this function to remove expressions that are known
constant from semi_rhs_exprs. And if we find that all expressions
in semi_rhs_exprs are known constant (the second loose end you
mentioned), we can give up building unique paths and fall back to a
traditional JOIN_SEMI.

Yeah, I was thinking we could just use the paths of the existing
rel, but really this case means that we'd need to de-dup down
to a single row. We could maybe do something involving plastering
LIMIT 1 on top of each input path

Unique node over incoming empty rows seems to act as LIMIT 1, in
particular, it breaks subplan execution after extracting the first row.
Then I see v3 patch covers this case perfectly.

--
Best regard,
Maksim Milyutin

#13

Richard Guo

guofenglinux@gmail.com

7 months ago

In reply to: Tom Lane (#11)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

On Wed, Aug 27, 2025 at 2:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

Instead of repeatedly calling make_pathkeys_for_sortclauses to detect
redundant expressions, I'm wondering if we could introduce a function
that detects whether a given expression is known equal to a constant
by the EquivalenceClass machinery. This function should not be too
costly, as we'd only need to examine ECs that contain pseudoconstants.

I did consider that, but rejected it. I kind of like the fact that
the v3 patch gets rid of duplicated columns-to-be-made-unique.
Yes, repeatedly doing make_pathkeys_for_sortclauses is a bit grotty,
because it's O(N^2) in the number of columns-to-be-made-unique ...
but realistically, how often is that ever more than a couple?
In the common case where the semijoin has only one join column,
the patch adds basically zero work, which is nice too.

Fair point. I'm now inclined to agree that the v3 patch is the better
approach, as it removes duplicate grouping expressions.

I looked into the first loose end you mentioned earlier and tried to
write a query that would produce duplicate hash keys for HashAggregate
but failed. After checking all the callers of create_agg_path(), it
seems that we now eliminate duplicate grouping keys in all cases.

That commit mentioned the possibility that duplicate columns might
have different hash functions assigned, so we may still need the
changes in the executor. But we need to update the test case with a
new query that produces duplicate hash keys, or remove it if we cannot
find one.

On the other hand, we may want to add a similar query in join.sql to
show that duplicate grouping expressions can be removed during
unique-ification.

We could then use this function to remove expressions that are known
constant from semi_rhs_exprs. And if we find that all expressions
in semi_rhs_exprs are known constant (the second loose end you
mentioned), we can give up building unique paths and fall back to a
traditional JOIN_SEMI.

Yeah, I was thinking we could just use the paths of the existing
rel, but really this case means that we'd need to de-dup down
to a single row. We could maybe do something involving plastering
LIMIT 1 on top of each input path, but it's such a weird and
probably-useless-in-the-real-world case that I don't think it's
worth expending effort on. Failing outright seems fine.

Yeah. When computing semi_rhs_exprs in compute_semijoin_info(), we
punt if we find that there are no columns to unique-ify.

/* Punt if we didn't find at least one column to unique-ify */
if (semi_rhs_exprs == NIL)
return;

I think the case we're discussing here -- where all semi_rhs_exprs are
known constants -- is essentially the same in that it leaves us no
columns to unique-ify. So it should be fine to just punt in this case
as well.

So I think we may need to add the following changes:

+   /* If there are no columns left to unique-ify, return NULL. */
+   if (sortPathkeys == NIL && groupClause == NIL)
+   {
+       MemoryContextSwitchTo(oldcontext);
+       return NULL;
+   }
+
    /* build unique paths based on input rel's pathlist */

Another question we need to consider is how to fix this error in v18,
where it seems we'll need to remove redundant expressions during
createplan.c.

Ugh ... I forgot you just refactored all that code.

I wonder if the right answer in v18 is to back out a3179ab69.
Not fixing that performance regression for another year would
be mildly annoying; but AFAIR we've still had only the one
complaint, so it seems like it's not hurting many people.
And we are getting mighty late in the release calendar to be
inventing new stuff in v18.

Yeah, inventing new stuff in v18 at this point seems risky. I think
backing out a3179ab69 should be fine for v18.

Thanks
Richard

#14

Tom Lane

tgl@sss.pgh.pa.us

6 months ago

In reply to: Richard Guo (#13)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

Richard Guo <guofenglinux@gmail.com> writes:

On Wed, Aug 27, 2025 at 2:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wonder if the right answer in v18 is to back out a3179ab69.
Not fixing that performance regression for another year would
be mildly annoying; but AFAIR we've still had only the one
complaint, so it seems like it's not hurting many people.
And we are getting mighty late in the release calendar to be
inventing new stuff in v18.

Yeah, inventing new stuff in v18 at this point seems risky. I think
backing out a3179ab69 should be fine for v18.

I had a go at doing that (attached) and the news is not good:
it dumps core in some regression tests added by the SJE patch,
seemingly because there are still references to the removed relid.

Now it might be that I failed to correctly adjust the places
where the revert didn't apply because of SJE-related changes.
But I think what is more likely is that the SJE code is not
bothering to update relids everywhere because it's expecting
a3179ab69's "rebuild" logic to take care of things later.

This leaves us with some pretty unappetizing choices about what
to do in v18:

1. Try to emulate the proposed HEAD fix.

2. Try to fix up the SJE patch so that it calculates relid changes
honestly, or at least no less honestly than what happened before
a3179ab69.

3. Revert SJE as well as a3179ab69 in v18.

I will have a look at #1. If there's somebody who wants to look
at #2, feel free, but it won't be me because I don't understand
the SJE patch well enough. Either way, it's not great to be
doing stuff like this just days before rc1 ...

regards, tom lane

#15

Tom Lane

tgl@sss.pgh.pa.us

6 months ago

In reply to: Tom Lane (#14)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

I wrote:

This leaves us with some pretty unappetizing choices about what
to do in v18:
1. Try to emulate the proposed HEAD fix.

Actually, a preliminary look suggests that we can approximate
that quite well, because v18's create_unique_path() is already
responsible for preparing the list of columns to be de-duplicated
by a UniquePath. So we can stick the same logic about whether
make_pathkeys_for_sortclauses believes that each column adds something
into create_unique_path(). Unlike the case with HEAD, that doesn't
overlap with any useful work the function was already doing, but
I still think that the cost will be negligible in normal cases.

I'll try to have a patch along those lines by tomorrow.

regards, tom lane

#16

Richard Guo

guofenglinux@gmail.com

6 months ago

In reply to: Tom Lane (#14)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

On Thu, Aug 28, 2025 at 6:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

This leaves us with some pretty unappetizing choices about what
to do in v18:

1. Try to emulate the proposed HEAD fix.

2. Try to fix up the SJE patch so that it calculates relid changes
honestly, or at least no less honestly than what happened before
a3179ab69.

3. Revert SJE as well as a3179ab69 in v18.

I will have a look at #1. If there's somebody who wants to look
at #2, feel free, but it won't be me because I don't understand
the SJE patch well enough. Either way, it's not great to be
doing stuff like this just days before rc1 ...

I took a look at #2. I don't have a good understanding of the SJE
patch either, so I might be missing something.

The core dump you mentioned can be reproduced with this query.

create table sj (a int unique, b int);

explain (verbose, costs off)
select t3.a from sj t1
join sj t2 on t1.a = t2.a
join lateral (select t1.a offset 0) t3 on true;

The reason is that remove_rel_from_query() does not update baserels'
lateral_vars lists and thus there are still references to the removed
relid there.

After fixing that issue, another error occurs during the regression
tests.

explain (costs off) select 1 from
(sk k1 join sk k2 on k1.a = k2.a)
join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
ERROR: variable not found in subplan target lists

The reason is that remove_rel_from_query() removes the old relid from
the attr_needed arrays but fails to add the substitute relid to them.

Hence, attached is the fix for SJE after reverting a3179ab69. With
it, all regression tests pass.

Thanks
Richard

#17

Tom Lane

tgl@sss.pgh.pa.us

6 months ago

In reply to: Richard Guo (#16)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

Richard Guo <guofenglinux@gmail.com> writes:

On Thu, Aug 28, 2025 at 6:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. Try to fix up the SJE patch so that it calculates relid changes
honestly, or at least no less honestly than what happened before
a3179ab69.

I took a look at #2. I don't have a good understanding of the SJE
patch either, so I might be missing something.

Thanks for looking at that!

Hence, attached is the fix for SJE after reverting a3179ab69. With
it, all regression tests pass.

I think that these are bugs/oversights in SJE that we should probably
fix in HEAD and v18, even if we're not going to revert a3179ab69.
We don't have strong reason to believe that they're not reachable
some other way.

Attached are a finished version of my v3 patch for HEAD, and the
promised adaptation of it for v18. The only surprise I ran into
was that I had to adopt the same sort of copy-from-the-parent
logic as you have in HEAD in create_unique_paths, because trying
to derive a child's list independently runs into assertion failures
inside make_pathkeys_for_sortclauses. In retrospect, probably
I shouldn't have been surprised.

With one eye on the fast-approaching release freeze for 18rc1,
I plan to go ahead and push these. Please do review if you
have time, but I think getting some buildfarm cycles on these
is time-critical now. (For the same reason, I suggest pushing
those SJE corrections sooner not later.)

regards, tom lane

#18

Richard Guo

guofenglinux@gmail.com

6 months ago

In reply to: Tom Lane (#17)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

On Fri, Aug 29, 2025 at 2:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Attached are a finished version of my v3 patch for HEAD, and the
promised adaptation of it for v18. The only surprise I ran into
was that I had to adopt the same sort of copy-from-the-parent
logic as you have in HEAD in create_unique_paths, because trying
to derive a child's list independently runs into assertion failures
inside make_pathkeys_for_sortclauses. In retrospect, probably
I shouldn't have been surprised.

With one eye on the fast-approaching release freeze for 18rc1,
I plan to go ahead and push these. Please do review if you
have time, but I think getting some buildfarm cycles on these
is time-critical now.

I reviewed the v5 patch and found one issue. For a child relation, we
shouldn't assume that the parent's unique rel or unique path always
exists. In cases where all RHS columns are equated to constants, we
currently don't build the unique rel/path for the parent table. This
issue results in SIGSEGV for the query below in both v18 and master.

create table p (a int, b int) partition by range(a);
create table p1 partition of p for values from (0) to (10);
create table p2 partition of p for values from (10) to (20);

set enable_partitionwise_join to on;

explain (costs off)
select * from p t1 where exists
(select 1 from p t2 where t1.a = t2.a and t1.a = 1);
server closed the connection unexpectedly

The fix is very simple:

@@ -8307,6 +8307,15 @@ create_unique_paths(PlannerInfo *root,
RelOptInfo *rel, SpecialJoinInfo *sjinfo)
if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
return NULL;

+   /*
+    * Punt if this is a child relation and we failed to build a unique-ified
+    * relation for its parent.  This can happen if all the RHS columns are
+    * found to be equated to constants when unique-ifying the parent table,
+    * leaving no columns to unique-ify.
+    */
+   if (IS_OTHER_REL(rel) && rel->top_parent->unique_rel == NULL)
+       return NULL;
+

I plan to push the fix to both v18 and master, if there are no
objections.

Thanks
Richard

#19

Tom Lane

tgl@sss.pgh.pa.us

6 months ago

In reply to: Richard Guo (#18)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

Richard Guo <guofenglinux@gmail.com> writes:

On Fri, Aug 29, 2025 at 2:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

With one eye on the fast-approaching release freeze for 18rc1,
I plan to go ahead and push these. Please do review if you
have time, but I think getting some buildfarm cycles on these
is time-critical now.

I reviewed the v5 patch and found one issue.

Ah, thanks for spotting that.

I plan to push the fix to both v18 and master, if there are no
objections.

Please do.

regards, tom lane

#20

Richard Guo

guofenglinux@gmail.com

6 months ago

In reply to: Tom Lane (#19)

Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error

On Fri, Aug 29, 2025 at 10:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

I plan to push the fix to both v18 and master, if there are no
objections.

Please do.

Done.

Thanks
Richard

[BUG] Remove self joins causes 'variable not found in subplan target lists' error

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: