pgsql: Avoid mislabeling of lateral references when pulling up a subque
Avoid mislabeling of lateral references when pulling up a subquery.
If we are pulling up a subquery that's under an outer join, and
the subquery's target list contains a strict expression that uses
both a subquery variable and a lateral-reference variable, it's okay
to pull up the expression without wrapping it in a PlaceHolderVar.
That's safe because if the subquery variable is forced to NULL
by the outer join, the expression result will come out as NULL too,
so we don't have to force that outcome by evaluating the expression
below the outer join. It'd be correct to wrap in a PHV, but that can
lead to very significantly worse plans, since we'd then have to use
a nestloop plan to pass down the lateral reference to where the
expression will be evaluated.
However, when we do that, we should not mark the lateral reference
variable as being nulled by the outer join, because it isn't after
we pull up the expression in this way. So the marking logic added
by cb8e50a4a was incorrect in this detail, leading to "wrong
varnullingrels" errors from the consistency-checking logic in
setrefs.c. It seems to be sufficient to just not mark lateral
references at all in this case. (I have a nagging feeling that more
complexity may be needed in cases where there are several levels of
outer join, but some attempts to break it with that didn't succeed.)
Per report from Bertrand Mamasam. Back-patch to v16, as the previous
patch was.
Discussion: /messages/by-id/CACZ67_UA_EVrqiFXJu9XK50baEpH=ofEPJswa2kFxg6xuSw-ww@mail.gmail.com
Branch
------
REL_16_STABLE
Details
-------
https://git.postgresql.org/pg/commitdiff/85990e2fd5610576635c65db9292297b1730c947
Modified Files
--------------
src/backend/optimizer/prep/prepjointree.c | 19 +++++++++++--
src/test/regress/expected/subselect.out | 47 +++++++++++++++++++++++++++++++
src/test/regress/sql/subselect.sql | 16 +++++++++++
3 files changed, 80 insertions(+), 2 deletions(-)
On Fri, Nov 29, 2024 at 7:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It seems to be sufficient to just not mark lateral
references at all in this case. (I have a nagging feeling that more
complexity may be needed in cases where there are several levels of
outer join, but some attempts to break it with that didn't succeed.)
You're right about your feeling. Here is a query that breaks it.
create table t (a int, b int);
explain (costs off)
select x from
t t1 left join
(t t2 left join
lateral (select t2.a+t3.a as x, * from t t3) t3 on t2.a <> t3.a)
on t1.b = t2.b;
ERROR: wrong varnullingrels (b) (expected (b 5)) for Var 2/1
'x' is nulled by ojrelids {4, 5}. When pulling up the subquery, it's
right that we should not mark the lateral reference variable 't2' as
being nulled by {4}, but we should mark it as being nulled by {5}.
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
On Fri, Nov 29, 2024 at 7:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It seems to be sufficient to just not mark lateral
references at all in this case. (I have a nagging feeling that more
complexity may be needed in cases where there are several levels of
outer join, but some attempts to break it with that didn't succeed.)
You're right about your feeling. Here is a query that breaks it.
Ah, thanks for the test case. I'll look into it tomorrow.
regards, tom lane
On Fri, Nov 29, 2024 at 10:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
On Fri, Nov 29, 2024 at 7:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It seems to be sufficient to just not mark lateral
references at all in this case. (I have a nagging feeling that more
complexity may be needed in cases where there are several levels of
outer join, but some attempts to break it with that didn't succeed.)
You're right about your feeling. Here is a query that breaks it.
Ah, thanks for the test case. I'll look into it tomorrow.
I spent some time looking into this issue. For the Vars/PHVs of the
subquery, we should mark them as being nulled by all the outer joins
in var->varnullingrels, for sure. For the Vars/PHVs of lateral
references, it seems that how we mark them depends on whether they are
under the same lowest nulling outer join as the subquery.
First of all, the lateral references cannot be outside of the lowest
outer join above the subquery; otherwise, is_simple_subquery() would
consider the subquery not eligible for pull-up. So, IIUC, the lateral
references should be marked either with the full var->varnullingrels,
or with that set excluding lowest_outer_join->rtindex, depending on
whether they are under the same lowest nulling outer join as the
subquery or not.
For example:
explain (costs off)
select x from
t t1 left join
(t t2 left join
lateral (select t2.a+t3.a as x, * from t t3) t3 on t2.a <> t3.a)
on t1.b = t2.b;
In this query, the lowest_outer_join is t2/t3 join {4}, and the
varnullingrels of 'x' is {4, 5}. For the lateral reference variable
't2.a', it is NOT under the same lowest nulling outer join, so we
should mark it with {5}.
explain (costs off)
select x from
t t1 left join
(t t2 inner join
lateral (select t2.a+t3.a as x, * from t t3) t3 on t2.a <> t3.a)
on t1.b = t2.b;
In this query, the lateral reference variable 't2.a' is under the same
lowest nulling outer join as the subquery, so we should just mark it
with var->varnullingrels.
I've drafted a patch based on this idea. I borrowed the concept of
lowest_nullable_relids from another patch of mine [1]/messages/by-id/CAMbWs4_eDRaJ9vv5zmNV=TbgkwtkXVmCVn7xpXCjksukc7NeEA@mail.gmail.com, which uses
lowest_nullable_relids to avoid wrapping lateral references that are
under the same lowest nulling outer join.
Please ignore all the comments and formatting stuff in the patch. I
haven't worked on those, and I haven't included a test case yet.
[1]: /messages/by-id/CAMbWs4_eDRaJ9vv5zmNV=TbgkwtkXVmCVn7xpXCjksukc7NeEA@mail.gmail.com
Thanks
Richard
Attachments:
v1-0001-Fix-wrong-varnullingrels.patchapplication/octet-stream; name=v1-0001-Fix-wrong-varnullingrels.patchDownload+98-18
Richard Guo <guofenglinux@gmail.com> writes:
I spent some time looking into this issue.
Thanks for looking at it!
First of all, the lateral references cannot be outside of the lowest
outer join above the subquery; otherwise, is_simple_subquery() would
consider the subquery not eligible for pull-up.
Yeah. While looking at this, I've been wondering if we don't have
enough infrastructure now to relax is_simple_subquery's restrictions.
The way it is now was designed to keep the world safe for placing
quals based on is_pushed_down flags (cf c64de21e9), but maybe with
OJ-labeled Vars we don't need to be so restrictive. Clearly, a
back-patched bug fix is no place to actually make such a change,
but we might want to code the fix with the thought in mind that that
could happen later. In particular, I'm not sure I buy the theory
that there's just one relid to get rid of when calculating the
appropriate nullingrels for a lateral reference.
Also, I don't think I buy the assumption that all the lateral
references need the same nullingrels. Your test case could easily
be extended to have lateral refs coming from two different levels
of outer join. So I think we need to calculate the right new
nullingrels for each lateral-ref varno appearing in the expression,
and do add_nulling_relids() over again for each one.
The ideas I'd been toying with last night involved a pre-scan over
the join tree to calculate the potential nullingrels of each leaf RTE
(same idea as RelOptInfo.nulling_relids, but of course we don't have
any RelOptInfos yet). That seems painful though because we'd have to
update the data structure somehow after each subquery pullup. It
might be better just to assume that pulled-up lateral references are
uncommon and push the work into the case where we have one, rather than
try to do setup work to make that case cheaper. With that approach,
you could imagine writing a function that traverses the join tree
from the top and computes the set of OJ relids that can potentially
null a particular target relid. Then we'd intersect that with the
varnullingrels of the Var being replaced to derive the correct
nullingrels for the lateral-ref Vars. (If we do it like that,
we may not need lowest_nullable_side etc yet, but I attach some
comments on that code anyway.)
I've drafted a patch based on this idea. I borrowed the concept of
lowest_nullable_relids from another patch of mine [1], which uses
lowest_nullable_relids to avoid wrapping lateral references that are
under the same lowest nulling outer join.
I find lowest_nullable_side fairly messy/confusing, in particular
that it might point at something that's not either side of the
lowest_outer_join parameter. I wonder whether there's another way
to do that. At the very least, the header comment for
pull_up_subqueries_recurse should point out that the two values
might be unrelated.
The calculation of lowest_nullable_relids in pull_up_simple_subquery
doesn't feel right --- it needs some comments at least.
On the same reasoning that a back-patched fix is no place to be
introducing unnecessary changes, I don't agree with revising the
rules for whether to wrap plain Vars/PHVs in this patch. We can
do that in a separate HEAD-only patch.
Do you want to continue working on this, or shall I take it?
The bug is my fault in the end :-(
regards, tom lane
I wrote:
The ideas I'd been toying with last night involved a pre-scan over
the join tree to calculate the potential nullingrels of each leaf RTE
(same idea as RelOptInfo.nulling_relids, but of course we don't have
any RelOptInfos yet). That seems painful though because we'd have to
update the data structure somehow after each subquery pullup.
I realized that we can make that work by doing the pre-calculation
at the start of each pull_up_simple_subquery. We only have to do
it for subqueries marked LATERAL, so this doesn't seem too horribly
inefficient. Draft patch attached. I didn't spend effort on
devising test cases, but it works for your example.
regards, tom lane
Attachments:
v2-0001-Fix-wrong-varnullingrels.patchtext/x-diff; charset=us-ascii; name=v2-0001-Fix-wrong-varnullingrels.patchDownload+154-5
On Sat, Nov 30, 2024 at 6:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
The ideas I'd been toying with last night involved a pre-scan over
the join tree to calculate the potential nullingrels of each leaf RTE
(same idea as RelOptInfo.nulling_relids, but of course we don't have
any RelOptInfos yet). That seems painful though because we'd have to
update the data structure somehow after each subquery pullup.I realized that we can make that work by doing the pre-calculation
at the start of each pull_up_simple_subquery. We only have to do
it for subqueries marked LATERAL, so this doesn't seem too horribly
inefficient. Draft patch attached. I didn't spend effort on
devising test cases, but it works for your example.
This patch looks good to me. I like that it will still work if,
someday, the restrictions of is_simple_subquery are relaxed to allow
lateral references to be outside the lowest outer join above the
subquery.
This patch can also simplify my other patch, which is to avoid
unnecessary wrapping for plain Vars/PHVs. We can check the new
nullingrel_info to see if the nullingrels of the subquery RTE are a
subset of the nullingrels of the lateral referenced rel, to determine
if the referenced rel is under the same lowest nulling outer join.
And this eliminates the need to introduce lowest_nullable_side.
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
This patch can also simplify my other patch, which is to avoid
unnecessary wrapping for plain Vars/PHVs. We can check the new
nullingrel_info to see if the nullingrels of the subquery RTE are a
subset of the nullingrels of the lateral referenced rel, to determine
if the referenced rel is under the same lowest nulling outer join.
And this eliminates the need to introduce lowest_nullable_side.
Right, because that's more or less the same problem. Or indeed
it's exactly the same problem, we're just making fast paths for
the easiest cases.
Another thought: the reason I made get_nullingrels return a new
struct rather than tying it directly to filling some fields in
pullup_replace_vars_context was that I think we might want to
reconsider where to call it from. Perhaps it'd be useful to
have the info during is_simple_subquery, for example.
regards, tom lane
On 11/29/24 05:33, Tom Lane wrote:
Avoid mislabeling of lateral references when pulling up a subquery.
If we are pulling up a subquery that's under an outer join, and
the subquery's target list contains a strict expression that uses
both a subquery variable and a lateral-reference variable, it's okay
to pull up the expression without wrapping it in a PlaceHolderVar.
That's safe because if the subquery variable is forced to NULL
by the outer join, the expression result will come out as NULL too,
so we don't have to force that outcome by evaluating the expression
below the outer join. It'd be correct to wrap in a PHV, but that can
lead to very significantly worse plans, since we'd then have to use
a nestloop plan to pass down the lateral reference to where the
expression will be evaluated.
Pardon the noise, but I'm curious why the optimiser must choose NestLoop
in the case of lateral reference.
It would be nice to provide alternatives. Because now we have some
corner cases. For example, with pull-up correlated subqueries, we've got
one degraded case. Look the following:
DROP TABLE IF EXISTS t1,t2;
CREATE TABLE t1(x int, y int);
CREATE TABLE t2(x int, y int);
INSERT INTO t1 (x,y)
SELECT gs,-gs FROM generate_series(1,1E4) AS gs;
ANALYZE t1,t2;
EXPLAIN (ANALYZE, COSTS ON)
SELECT t1.* FROM t1 LEFT JOIN LATERAL (
SELECT t3.* FROM t1 t3 WHERE t3.x=t1.x) AS t4
ON (t4.y IN (SELECT y FROM t2 WHERE t4.x=t2.x));
In previous versions Postgres executed this plan in milliseconds:
Hash Left Join
Hash Cond: (t1.x = t3.x)
-> Seq Scan on t1
-> Hash
-> Seq Scan on t1 t3
Filter: (ANY (y = (SubPlan 1).col1))
SubPlan 1
-> Seq Scan on t2
Filter: (t3.x = x)
Planning Time: 0.175 ms
Execution Time: 6.396 ms
But now we have seconds:
Nested Loop Left Join
-> Seq Scan on t1
-> Nested Loop Semi Join
Join Filter: ((t3.x = t2.x) AND (t3.y = t2.y))
-> Seq Scan on t1 t3
Filter: (x = t1.x)
-> Seq Scan on t2
Planning Time: 1.309 ms
Execution Time: 6780.217 ms
Correlated subquery pull-up is a nice optimisation, of course. So, why
not let optimiser try a HashJoin like that (not a really generated
plan, just my imagination):
Hash Left Join
Hash Cond: (t1.x = t3.x)
-> Seq Scan on t1
Hash
-> Nested Loop Semi Join
Join Filter: ((t3.x = t2.x) AND (t3.y = t2.y))
-> Seq Scan on t1 t3
-> Seq Scan on t2
Does the optimiser have some internal limits to let such a path?
--
regards, Andrei Lepikhov
Andrei Lepikhov <lepihov@gmail.com> writes:
Pardon the noise, but I'm curious why the optimiser must choose NestLoop
in the case of lateral reference.
To pass down the current outer row's value of the lateral reference.
It would be nice to provide alternatives. Because now we have some
corner cases.
That's a fairly bizarre case, and I'm not at all convinced that
the old plan was correct.
regards, tom lane