ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
I found an error similar to others before ([1]/messages/by-id/CAHewXNnu7u1aT==WjnCRa+SzKb6s80hvwPP_9eMvvvtdyFdqjw@mail.gmail.com) that is still persists as of head right now (0bcb3ca3b9).
CREATE TABLE t (
n INTEGER
);
SELECT *
FROM (VALUES (1)) t(c)
LEFT JOIN t ljl1 ON true
LEFT JOIN LATERAL (WITH cte AS (SELECT * FROM t WHERE t.n = ljl1.n) SELECT * FROM cte) ljl2 ON ljl1.n = 1;
ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
Note that the error does **not** occur if the CTE is unwrapped like this:
SELECT *
FROM (VALUES (1)) t(c)
LEFT JOIN t ljl1 ON true
LEFT JOIN LATERAL (SELECT * FROM t WHERE t.n = ljl1.n) ljl2 ON ljl1.n = 1;
-markus
[1]: /messages/by-id/CAHewXNnu7u1aT==WjnCRa+SzKb6s80hvwPP_9eMvvvtdyFdqjw@mail.gmail.com
On Tue, May 30, 2023 at 10:48 PM Markus Winand <markus.winand@winand.at>
wrote:
I found an error similar to others before ([1]) that is still persists as
of head right now (0bcb3ca3b9).CREATE TABLE t (
n INTEGER
);SELECT *
FROM (VALUES (1)) t(c)
LEFT JOIN t ljl1 ON true
LEFT JOIN LATERAL (WITH cte AS (SELECT * FROM t WHERE t.n = ljl1.n)
SELECT * FROM cte) ljl2 ON ljl1.n = 1;ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
Note that the error does **not** occur if the CTE is unwrapped like this:
SELECT *
FROM (VALUES (1)) t(c)
LEFT JOIN t ljl1 ON true
LEFT JOIN LATERAL (SELECT * FROM t WHERE t.n = ljl1.n) ljl2 ON ljl1.n =
1;
Thanks for the report! Reproduced here. Also it can be reproduced with
subquery, as long as the subquery is not pulled up.
SELECT *
FROM (VALUES (1)) t(c)
LEFT JOIN t ljl1 ON true
LEFT JOIN LATERAL (SELECT * FROM t WHERE t.n = ljl1.n offset 0) ljl2 ON
ljl1.n = 1;
ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
When we transform the first form of identity 3 to the second form, we've
converted Pb*c to Pbc in deconstruct_distribute_oj_quals. But we
neglect to consider that rel C might be a RTE_SUBQUERY and contains
quals that have lateral references to B. So the B vars in such quals
have wrong nulling bitmaps and we'd finally notice that when we do
fix_upper_expr for the NestLoopParam expressions.
Thanks
Richard
On Wed, May 31, 2023 at 10:47 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, May 30, 2023 at 10:48 PM Markus Winand <markus.winand@winand.at>
wrote:I found an error similar to others before ([1]) that is still persists as
of head right now (0bcb3ca3b9).CREATE TABLE t (
n INTEGER
);SELECT *
FROM (VALUES (1)) t(c)
LEFT JOIN t ljl1 ON true
LEFT JOIN LATERAL (WITH cte AS (SELECT * FROM t WHERE t.n = ljl1.n)
SELECT * FROM cte) ljl2 ON ljl1.n = 1;ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
Note that the error does **not** occur if the CTE is unwrapped like this:
SELECT *
FROM (VALUES (1)) t(c)
LEFT JOIN t ljl1 ON true
LEFT JOIN LATERAL (SELECT * FROM t WHERE t.n = ljl1.n) ljl2 ON ljl1.n =
1;Thanks for the report! Reproduced here. Also it can be reproduced with
subquery, as long as the subquery is not pulled up.SELECT *
FROM (VALUES (1)) t(c)
LEFT JOIN t ljl1 ON true
LEFT JOIN LATERAL (SELECT * FROM t WHERE t.n = ljl1.n offset 0) ljl2 ON
ljl1.n = 1;
ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1When we transform the first form of identity 3 to the second form, we've
converted Pb*c to Pbc in deconstruct_distribute_oj_quals. But we
neglect to consider that rel C might be a RTE_SUBQUERY and contains
quals that have lateral references to B. So the B vars in such quals
have wrong nulling bitmaps and we'd finally notice that when we do
fix_upper_expr for the NestLoopParam expressions.
We can identify in which form of identity 3 the plan is built up by
checking the relids of the B/C join's outer rel. If it's in the first
form, the outer rel's relids must contain the A/B join. Otherwise it
should only contain B's relid. So I'm considering that maybe we can
adjust the nulling bitmap for nestloop parameters according to that.
Attached is a patch for that. Does this make sense?
Thanks
Richard
Attachments:
v1-0001-Fix-nulling-bitmap-for-nestloop-parameters.patchapplication/octet-stream; name=v1-0001-Fix-nulling-bitmap-for-nestloop-parameters.patchDownload+33-1
On 31.05.2023, at 08:36, Richard Guo <guofenglinux@gmail.com> wrote:
Attached is a patch for that. Does this make sense?
Thanks
Richard
<v1-0001-Fix-nulling-bitmap-for-nestloop-parameters.patch>
All I can say is that it fixes the error for me — also for the non-simplified original query that I have.
-markus
Richard Guo <guofenglinux@gmail.com> writes:
On Wed, May 31, 2023 at 10:47 AM Richard Guo <guofenglinux@gmail.com> wrote:
When we transform the first form of identity 3 to the second form, we've
converted Pb*c to Pbc in deconstruct_distribute_oj_quals. But we
neglect to consider that rel C might be a RTE_SUBQUERY and contains
quals that have lateral references to B. So the B vars in such quals
have wrong nulling bitmaps and we'd finally notice that when we do
fix_upper_expr for the NestLoopParam expressions.
Right. One question that immediately raises is whether it's even safe
to apply the identity if C has lateral references to B, because that
almost certainly means that C will produce different output when
joined to a nulled B row than when joined to a not-nulled row.
I think that's okay because if the B row will fail Pab then it doesn't
matter what row(s) C produces, but maybe I've missed something.
We can identify in which form of identity 3 the plan is built up by
checking the relids of the B/C join's outer rel. If it's in the first
form, the outer rel's relids must contain the A/B join. Otherwise it
should only contain B's relid. So I'm considering that maybe we can
adjust the nulling bitmap for nestloop parameters according to that.
Attached is a patch for that. Does this make sense?
Hmm. I don't really want to do it in identify_current_nestloop_params
because that gets applied to all nestloop params, so it seems like
that risks masking bugs of other kinds. I'd rather do it in
process_subquery_nestloop_params, which we know is only applied to
subquery LATERAL references. So more or less as attached.
(I dropped the equal() assertions in process_subquery_nestloop_params
because they've never caught anything and it'd be too complicated to
make them still work.)
Thoughts?
regards, tom lane
Attachments:
v2-0001-Fix-nulling-bitmap-for-nestloop-parameters.patchtext/x-diff; charset=us-ascii; name=v2-0001-Fix-nulling-bitmap-for-nestloop-parameters.patchDownload+59-10
On Sat, Jun 10, 2023 at 12:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
We can identify in which form of identity 3 the plan is built up by
checking the relids of the B/C join's outer rel. If it's in the first
form, the outer rel's relids must contain the A/B join. Otherwise it
should only contain B's relid. So I'm considering that maybe we can
adjust the nulling bitmap for nestloop parameters according to that.
Attached is a patch for that. Does this make sense?Hmm. I don't really want to do it in identify_current_nestloop_params
because that gets applied to all nestloop params, so it seems like
that risks masking bugs of other kinds. I'd rather do it in
process_subquery_nestloop_params, which we know is only applied to
subquery LATERAL references. So more or less as attached.
Yeah, that makes sense. process_subquery_nestloop_params is a better
place to do this adjustments. +1 to v2 patch.
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
Yeah, that makes sense. process_subquery_nestloop_params is a better
place to do this adjustments. +1 to v2 patch.
Pushed, then.
regards, tom lane
On Mon, Jun 12, 2023 at 10:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
Yeah, that makes sense. process_subquery_nestloop_params is a better
place to do this adjustments. +1 to v2 patch.Pushed, then.
Oh, wait ... It occurred to me that we may have this same issue with
Memoize cache keys. In get_memoize_path we collect the cache keys from
innerpath's ppi_clauses and innerrel's lateral_vars, and the latter may
contain nullingrel markers that need adjustment. As an example,
consider the query below
explain (costs off)
select * from onek t1
left join onek t2 on true
left join lateral
(select * from onek t3 where t3.two = t2.two offset 0) s
on t2.unique1 = 1;
ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/3
Attached is a patch that does the same adjustments to innerrel's
lateral_vars before they are added to MemoizePath->param_exprs.
I was wondering if there are more places that need this kind of
adjustments. After some thoughts I believe the Memoize cache keys
should be the last one regarding adjustments to nestloop parameters.
AFAICS the lateral references in origin query would go to two places,
one is plan_params and the other is lateral_vars. And now we've handled
both of them.
Thanks
Richard
Attachments:
v3-0001-Fix-nulling-bitmap-for-Memoize-cache-keys.patchapplication/octet-stream; name=v3-0001-Fix-nulling-bitmap-for-Memoize-cache-keys.patchDownload+70-6
Richard Guo <guofenglinux@gmail.com> writes:
Oh, wait ... It occurred to me that we may have this same issue with
Memoize cache keys. In get_memoize_path we collect the cache keys from
innerpath's ppi_clauses and innerrel's lateral_vars, and the latter may
contain nullingrel markers that need adjustment. As an example,
consider the query below
explain (costs off)
select * from onek t1
left join onek t2 on true
left join lateral
(select * from onek t3 where t3.two = t2.two offset 0) s
on t2.unique1 = 1;
ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/3
Good catch --- I'll take a closer look tomorrow.
regards, tom lane
I wrote:
Richard Guo <guofenglinux@gmail.com> writes:
Oh, wait ... It occurred to me that we may have this same issue with
Memoize cache keys.
Good catch --- I'll take a closer look tomorrow.
Pushed after a little more fiddling with the comments.
regards, tom lane
On Wed, Jun 14, 2023 at 6:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
Richard Guo <guofenglinux@gmail.com> writes:
Oh, wait ... It occurred to me that we may have this same issue with
Memoize cache keys.Good catch --- I'll take a closer look tomorrow.
Pushed after a little more fiddling with the comments.
I just realized that we may still have holes in this area. Until now
we're mainly focusing on LATERAL subquery, in which case the lateral
reference Vars are copied into rel->subplan_params and we've already
adjusted the nulling bitmaps there. But what about the lateral
reference Vars in other cases?
In extract_lateral_references() we consider 5 cases,
/* Fetch the appropriate variables */
if (rte->rtekind == RTE_RELATION)
vars = pull_vars_of_level((Node *) rte->tablesample, 0);
else if (rte->rtekind == RTE_SUBQUERY)
vars = pull_vars_of_level((Node *) rte->subquery, 1);
else if (rte->rtekind == RTE_FUNCTION)
vars = pull_vars_of_level((Node *) rte->functions, 0);
else if (rte->rtekind == RTE_TABLEFUNC)
vars = pull_vars_of_level((Node *) rte->tablefunc, 0);
else if (rte->rtekind == RTE_VALUES)
vars = pull_vars_of_level((Node *) rte->values_lists, 0);
else
{
Assert(false);
return; /* keep compiler quiet */
}
We've handled the second case, i.e., RTE_SUBQUERY. It's not hard to
compose a query for each of the other 4 cases that shows that we need to
adjust the nulling bitmaps for them too.
1. RTE_RELATION with tablesample
explain (costs off)
select * from int8_tbl t1
left join int8_tbl t2 on true
left join lateral
(select * from int8_tbl t3 TABLESAMPLE SYSTEM (t2.q1)) s
on t2.q1 = 1;
ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
2. RTE_FUNCTION
explain (costs off)
select * from int8_tbl t1
left join int8_tbl t2 on true
left join lateral
(select * from generate_series(t2.q1, 100)) s
on t2.q1 = 1;
ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
3. RTE_TABLEFUNC
explain (costs off)
select * from xmltest2 t1
left join xmltest2 t2 on true
left join lateral
xmltable('/d/r' PASSING t2.x COLUMNS a int)
on t2._path = 'a';
ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
4. RTE_VALUES
explain (costs off)
select * from int8_tbl t1
left join int8_tbl t2 on true
left join lateral
(select q1 from (values(t2.q1), (t2.q1)) v(q1)) s
on t2.q1 = 1;
ERROR: wrong varnullingrels (b 3) (expected (b)) for Var 2/1
So it seems that we need to do nullingrel adjustments in a more common
place.
Also, there might be lateral references in the tlist, so the query below
is supposed to also encounter the 'wrong varnullingrels' error.
explain (costs off)
select * from int8_tbl t1
left join int8_tbl t2 on true
left join lateral
(select t2.q1 from int8_tbl t3) s
on t2.q1 = 1;
server closed the connection unexpectedly
But as we can see, it triggers the Assert in try_nestloop_path.
/* If we got past that, we shouldn't have any unsafe outer-join refs */
Assert(!have_unsafe_outer_join_ref(root, outerrelids, inner_paramrels));
I think it exposes a new issue. It seems that we extract a problematic
lateral_relids from lateral references within PlaceHolderVars in
create_lateral_join_info. I doubt that we should use ph_lateral
directly. It seems more reasonable to me that we strip outer-join
relids from ph_lateral and then use that for lateral_relids.
Any thoughts?
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
I just realized that we may still have holes in this area. Until now
we're mainly focusing on LATERAL subquery, in which case the lateral
reference Vars are copied into rel->subplan_params and we've already
adjusted the nulling bitmaps there. But what about the lateral
reference Vars in other cases?
Ugh.
So it seems that we need to do nullingrel adjustments in a more common
place.
I agree: this suggests that we fixed it in the wrong place.
I think it exposes a new issue. It seems that we extract a problematic
lateral_relids from lateral references within PlaceHolderVars in
create_lateral_join_info. I doubt that we should use ph_lateral
directly. It seems more reasonable to me that we strip outer-join
relids from ph_lateral and then use that for lateral_relids.
Hmm. I don't have time to think hard about this today, but this
does feel similar to our existing decision that parameterized paths
should be generated with minimal nullingrels bits on their outer
references. We only thought about pushed-down join clauses when we
did that. But a lateral ref necessarily gives rise to parameterized
path(s), and what we seem to be seeing is that those need to be
handled just the same as ones generated by pushing down join clauses.
regards, tom lane
I wrote:
Richard Guo <guofenglinux@gmail.com> writes:
So it seems that we need to do nullingrel adjustments in a more common
place.
I agree: this suggests that we fixed it in the wrong place.
So pursuant to that, 0001 attached reverts the code changes from bfd332b3f
and 63e4f13d2 (keeping the test cases and some unrelated comment fixes).
Then the question is what to do instead. I've not come up with a better
idea than to hack it in identify_current_nestloop_params (per 0002), as
you proposed upthread. I don't like this too much, as it's on the hairy
edge of making setrefs.c's nullingrel cross-checks completely useless for
NestLoopParams; but the alternatives aren't attractive either.
I think it exposes a new issue. It seems that we extract a problematic
lateral_relids from lateral references within PlaceHolderVars in
create_lateral_join_info. I doubt that we should use ph_lateral
directly. It seems more reasonable to me that we strip outer-join
relids from ph_lateral and then use that for lateral_relids.
I experimented with that (0003) and it fixes your example query.
I think it is functionally okay, because the lateral_relids just need
to be a sufficient subset of the lateral references' requirements to
ensure we can evaluate them where needed; other mechanisms should ensure
that the right sorts of joins happen. It seems a bit unsatisfying
though, especially given that we just largely lobotomized setrefs.c's
cross-checks for these same references. I don't have a better idea
however, and beta2 is fast approaching.
regards, tom lane