Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

Started by Andrei Lepikhovabout 3 years ago46 messagesbugs
Jump to latest
#1Andrei Lepikhov
lepihov@gmail.com

Hi,
sqlancer benchmark raised an issue with pushing down clauses in LEFT
JOINs. Example:

DROP TABLE IF EXISTS t1,t2,t3,t4 CASCADE;
CREATE TABLE t1 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t2 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t3 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t4 AS SELECT true AS x FROM generate_series(0,1) x;
ANALYZE;

EXPLAIN (ANALYZE, COSTS OFF)
SELECT ALL t1.x FROM t1, t2
LEFT OUTER JOIN t3
ON t3.x
LEFT OUTER JOIN t4
ON t3.x
WHERE t4.x ISNULL;

We should get zero tuples, right? But we have the explain:
Nested Loop (actual time=0.024..0.028 rows=4 loops=1)
...

REL_15_STABLE works correctly. As I remember, it could be induces by new
machinery, introduced by the 'Making Vars outer-join aware' patch.

Sorry for flood, if you are working on that issue.

--
regards,
Andrey Lepikhov
Postgres Professional

#2Richard Guo
guofenglinux@gmail.com
In reply to: Andrei Lepikhov (#1)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On Wed, Feb 22, 2023 at 2:48 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru>
wrote:

DROP TABLE IF EXISTS t1,t2,t3,t4 CASCADE;
CREATE TABLE t1 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t2 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t3 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t4 AS SELECT true AS x FROM generate_series(0,1) x;
ANALYZE;

EXPLAIN (ANALYZE, COSTS OFF)
SELECT ALL t1.x FROM t1, t2
LEFT OUTER JOIN t3
ON t3.x
LEFT OUTER JOIN t4
ON t3.x
WHERE t4.x ISNULL;

Thanks for the report! I think this is a new issue that was not
reported before. I simplify this query a little for easy debugging as

# explain (costs off)
select * from t1 left join t2 on true left join t3 on t2.x where t3.x is
null;
QUERY PLAN
----------------------------------------
Nested Loop Left Join
-> Seq Scan on t1
-> Materialize
-> Nested Loop Left Join
Join Filter: t2.x
Filter: (t3.x IS NULL)
-> Seq Scan on t2
-> Materialize
-> Seq Scan on t3
(9 rows)

The qual 't3.x IS NULL' is placed at the wrong place. This qual's Var
't3.x' is marked with t2/t3 join in its varnullingrels by the parser,
which is right for the user-given order. After we've commuted t1/t2
join and t2/t3 join, Var 't3.x' can actually be nulled by both t2/t3
join and t1/t2 join. We neglect to adjust this qual accordingly in this
case.

ISTM that for outer join identity 3, if we are given form
(A leftjoin B on (Pab)) leftjoin C on (Pbc)
then references to C Vars in higher qual levels would be marked with the
B/C join. If we've transformed it to form
A leftjoin (B leftjoin C on (Pbc)) on (Pab)
then references to C Vars in higher qual levels should be adjusted to
include both B/C join and A/B join in their varnullingrels.

References to A Vars and B Vars in higher qual levels do not have this
problem though.

Thanks
Richard

#3Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#2)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On Wed, Feb 22, 2023 at 6:24 PM Richard Guo <guofenglinux@gmail.com> wrote:

ISTM that for outer join identity 3, if we are given form
(A leftjoin B on (Pab)) leftjoin C on (Pbc)
then references to C Vars in higher qual levels would be marked with the
B/C join. If we've transformed it to form
A leftjoin (B leftjoin C on (Pbc)) on (Pab)
then references to C Vars in higher qual levels should be adjusted to
include both B/C join and A/B join in their varnullingrels.

A quick hack that comes to my mind is that for a pushed down clause we
check all outer join relids it mentions and add the outer joins'
commute_below to the clause's required_relids, so that after we've
commuted the outer joins, the clause would still be placed in the right
place.

--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2349,12 +2349,27 @@ distribute_qual_to_rels(PlannerInfo *root, Node
*clause,
    }
    else
    {
+       ListCell   *l;
+
        /*
         * Normal qual clause or degenerate outer-join clause.  Either way,
we
         * can mark it as pushed-down.
         */
        is_pushed_down = true;
+       /*
+        * Add in commute_below of outer joins mentioned within the clause,
so
+        * that after we've commuted the outer joins, the clause would
still be
+        * placed correctly.
+        */
+       foreach(l, root->join_info_list)
+       {
+           SpecialJoinInfo *sji = (SpecialJoinInfo *) lfirst(l);
+
+           if (bms_is_member(sji->ojrelid, relids))
+               relids = bms_add_members(relids, sji->commute_below);
+       }
+

For a formal fix, I wonder if we need to generate multiple versions of
such a clause and apply the appropriate one depending on which join
order is chosen, just like what we do for left join quals in
deconstruct_distribute_oj_quals.

Thanks
Richard

#4Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#3)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On Thu, Feb 23, 2023 at 11:29 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Feb 22, 2023 at 6:24 PM Richard Guo <guofenglinux@gmail.com>
wrote:

ISTM that for outer join identity 3, if we are given form
(A leftjoin B on (Pab)) leftjoin C on (Pbc)
then references to C Vars in higher qual levels would be marked with the
B/C join. If we've transformed it to form
A leftjoin (B leftjoin C on (Pbc)) on (Pab)
then references to C Vars in higher qual levels should be adjusted to
include both B/C join and A/B join in their varnullingrels.

A quick hack that comes to my mind is that for a pushed down clause we
check all outer join relids it mentions and add the outer joins'
commute_below to the clause's required_relids, so that after we've
commuted the outer joins, the clause would still be placed in the right
place.

I've realized this hack is not correct :-(. References to A Vars and B
Vars in higher qual levels do not need this adjustment. So this code
change would add unnecessary relids for clauses that involve A Vars or B
Vars but not C Vars, resulting them being placed at higher place than
needed.

Maybe what we need is to add in commute_below_l (assuming it has been
recorded in SpecialJoinInfo) rather than commute_below of outer joins
mentioned within the clause?

Thanks
Richard

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#2)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

Richard Guo <guofenglinux@gmail.com> writes:

ISTM that for outer join identity 3, if we are given form
(A leftjoin B on (Pab)) leftjoin C on (Pbc)
then references to C Vars in higher qual levels would be marked with the
B/C join. If we've transformed it to form
A leftjoin (B leftjoin C on (Pbc)) on (Pab)
then references to C Vars in higher qual levels should be adjusted to
include both B/C join and A/B join in their varnullingrels.

Hmm. I remember having convinced myself that we didn't need to change
anything above the commuting OJs, but now I can't reconstruct the
reasoning :-(.

I wonder whether we could instead fix things by deeming that the result
of the pushed-down B/C join does not yet produce correct C* variables,
so that we won't allow conditions involving them to drop below the
pushed-up A/B join. This would be a little bit messy in some places
because now we'd consider that the A/B join is adding two OJ relids
not just one to the output join relid set, while the B/C join is adding
no relids even though it must execute an outer join. (But we already
have that notion for semijoins and some antijoins, so maybe it's fine.)

regards, tom lane

#6Andrei Lepikhov
lepihov@gmail.com
In reply to: Richard Guo (#4)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On 23/2/2023 11:36, Richard Guo wrote:

I've realized this hack is not correct :-(.  References to A Vars and B
Vars in higher qual levels do not need this adjustment.  So this code
change would add unnecessary relids for clauses that involve A Vars or B
Vars but not C Vars, resulting them being placed at higher place than
needed.

Maybe what we need is to add in commute_below_l (assuming it has been
recorded in SpecialJoinInfo) rather than commute_below of outer joins
mentioned within the clause?

Doing a comparison of pg15 and master initsplan.c I found that a clear
and documented algorithm for delaying the application of a qual is lost
or replaced with some commute_* fields.
It is not clear how we guarantee correct application delay of a qual.
Which part of the code implements it now?

--
regards,
Andrey Lepikhov
Postgres Professional

#7Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#5)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On Fri, Feb 24, 2023 at 3:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wonder whether we could instead fix things by deeming that the result
of the pushed-down B/C join does not yet produce correct C* variables,
so that we won't allow conditions involving them to drop below the
pushed-up A/B join. This would be a little bit messy in some places
because now we'd consider that the A/B join is adding two OJ relids
not just one to the output join relid set, while the B/C join is adding
no relids even though it must execute an outer join. (But we already
have that notion for semijoins and some antijoins, so maybe it's fine.)

I think I see your points. Semijoins and antijoins derived from
semijoins do not have rtindex, so they do not add any OJ relids to the
output join relid set. Do you mean we do the similar thing to the
pushed-down B/C join here by not adding B/C's ojrelid to the output B/C
join, but instead add it later when we've formed the pushed-up A/B join?

I tried the codes below to adjust joinrelids and then use the modified
joinrelids when constructing restrict and join clause lists for the
joinrel. It seems to be able to solve the presented case. But I'm not
sure if it'd break other cases.

+  ListCell   *lc;
+  Relids      commute_below_l = NULL;
+
+  foreach(lc, root->join_info_list)
+  {
+      SpecialJoinInfo *otherinfo = (SpecialJoinInfo *) lfirst(lc);
+
+      if (otherinfo->jointype != JOIN_LEFT)
+          continue;
+
+       /* collect commute_below_l */
+      if (bms_is_member(otherinfo->ojrelid, sjinfo->commute_below) &&
+          bms_overlap(otherinfo->syn_righthand, sjinfo->syn_lefthand))
+          commute_below_l =
+              bms_add_member(commute_below_l, otherinfo->ojrelid);
+
+      /*
+       * add the pushed-down B/C join's relid to A/B join's relid set
+       */
+      if (bms_is_member(otherinfo->ojrelid, sjinfo->commute_above_l) &&
+          bms_overlap(otherinfo->min_righthand, joinrelids))
+          joinrelids = bms_add_member(joinrelids, otherinfo->ojrelid);
+  }
+
+  /*
+   * delete the pushed-down B/C join's relid from B/C join
+   */
+  if (!bms_is_empty(commute_below_l) &&
+      !bms_overlap(commute_below_l, joinrelids))
+      joinrelids = bms_del_member(joinrelids, sjinfo->ojrelid);

Also I'm wondering whether we can just copy what we once did in
check_outerjoin_delay() to update required_relids of a pushed-down
clause. This seems to be a lazy but workable way.

Thanks
Richard

#8Richard Guo
guofenglinux@gmail.com
In reply to: Andrei Lepikhov (#6)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On Mon, Feb 27, 2023 at 2:23 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru>
wrote:

Doing a comparison of pg15 and master initsplan.c I found that a clear
and documented algorithm for delaying the application of a qual is lost
or replaced with some commute_* fields.
It is not clear how we guarantee correct application delay of a qual.
Which part of the code implements it now?

Do you mean function check_outerjoin_delay()? Yes it has been removed
in b448f1c8, since now we consider that outer joins listed in
varnullingrels or phnullingrels are used in the clause, so that the
clause would not be placed below outer joins that should null some of
its vars.

Such as in query

select * from t1 left join t2 on true where coalesce(t2.a,1) = 1;

we'd consider that the clause uses t2 as well as t1/t2 join, so it
cannot be pushed down below the left join.

Thanks
Richard

#9Andrei Lepikhov
lepihov@gmail.com
In reply to: Richard Guo (#8)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On 27/2/2023 13:16, Richard Guo wrote:

On Mon, Feb 27, 2023 at 2:23 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:

Doing a comparison of pg15 and master initsplan.c I found that a clear
and documented algorithm for delaying the application of a qual is lost
or replaced with some commute_* fields.
It is not clear how we guarantee correct application delay of a qual.
Which part of the code implements it now?

Do you mean function check_outerjoin_delay()?  Yes it has been removed
in b448f1c8, since now we consider that outer joins listed in
varnullingrels or phnullingrels are used in the clause, so that the
clause would not be placed below outer joins that should null some of
its vars.

I see. But these logics looks non-equivalent.
relnode.c::build_joinrel_tlist() adds relid into varnullingrels only in
the case when the var exists in the target list. So, other joins, which
don't have links to the var, don't get into the bitmapset too.

--
regards,
Andrey Lepikhov
Postgres Professional

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrei Lepikhov (#9)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

On 27/2/2023 13:16, Richard Guo wrote:

Do you mean function check_outerjoin_delay()?  Yes it has been removed
in b448f1c8, since now we consider that outer joins listed in
varnullingrels or phnullingrels are used in the clause, so that the
clause would not be placed below outer joins that should null some of
its vars.

I see. But these logics looks non-equivalent.

They're not meant to be equivalent. The new code should be more
flexible and more trustworthy --- in particular, there's nothing that
didn't suck about the old outerjoin_delayed mechanism. Before we had
only a very ad-hoc model of where outer-join quals could be evaluated.
I've spent the last sixteen years living in fear that somebody would
find a bug in it that couldn't be fixed without making many plans
disastrously worse. Now we have something that actually faithfully
represents where quals can be evaluated and why.

Admittedly the new stuff is having more teething pains than I'd hoped
for, but we'll work through it. Most of the bugs are stemming from
the fact that we now need a rigorous representation of what is
happening when we commute outer joins per identity 3, and we're
finding out that we are not quite there yet.

regards, tom lane

#11Andrei Lepikhov
lepihov@gmail.com
In reply to: Tom Lane (#10)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On 28/2/2023 00:02, Tom Lane wrote:

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

On 27/2/2023 13:16, Richard Guo wrote:

Do you mean function check_outerjoin_delay()?  Yes it has been removed
in b448f1c8, since now we consider that outer joins listed in
varnullingrels or phnullingrels are used in the clause, so that the
clause would not be placed below outer joins that should null some of
its vars.

I see. But these logics looks non-equivalent.

Admittedly the new stuff is having more teething pains than I'd hoped
for, but we'll work through it. Most of the bugs are stemming from
the fact that we now need a rigorous representation of what is
happening when we commute outer joins per identity 3, and we're
finding out that we are not quite there yet.

Ok, maybe my language still not so fluent, as I think sometimes. In
accordance to the example above:
1. varnullingrels contains relids of entries which can null the var. In
our case it (correctly) contains t3 and OJ(t3,t4)
2. Syntactic scope of the clause correctly contains all relations and OJs
3. In the distribute_qual_to_rels I don't see a code which should
disallow pushing down a clause from higher syntactic level into nullable
side of OJ. Where is the logic which should limit the lowest level of
the clause push down?

--
regards,
Andrey Lepikhov
Postgres Professional

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrei Lepikhov (#11)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:

Ok, maybe my language still not so fluent, as I think sometimes. In
accordance to the example above:
1. varnullingrels contains relids of entries which can null the var. In
our case it (correctly) contains t3 and OJ(t3,t4)
2. Syntactic scope of the clause correctly contains all relations and OJs
3. In the distribute_qual_to_rels I don't see a code which should
disallow pushing down a clause from higher syntactic level into nullable
side of OJ. Where is the logic which should limit the lowest level of
the clause push down?

It's the same logic that prevents push-down to joins that lack a
relation you're trying to reference, namely the checks of
RestrictInfo.required_relids. If a clause should be referring to the
possibly-nulled version of some variable, then its required_relids now
include the relid of the outer join that nulls that variable, so we
can't push it to below that outer join.

This is straightforward enough until you start applying outer join
identity 3:

3. (A leftjoin B on (Pab)) leftjoin C on (Pbc)
= A leftjoin (B leftjoin C on (Pbc)) on (Pab)
granting that Pbc is strict for B

Suppose that we are given a FROM clause in the first form and
that there's a non-strict WHERE clause referencing a column of C.
That clause should certainly see the nulled version of C, so if we
implement the join as written then we can apply the WHERE clause as
a filter condition (not join condition) on the join of A/B to C.
However, if we decide to implement the join according to the second
form of the identity, then the current code believes it can push
the WHERE clause down so that it's a filter on the join of B to C.
It's not quite obvious that that's wrong, but it is: we might see
C values that are non-null at this point but will become null after
application of the A/B join. Then the WHERE clause gives misleading
answers, which is exactly what's going wrong in this example.

The way I'm proposing to fix this is to say that if the B/C join
is done first, then its output C columns do not yet satisfy the
nulling action of the syntactically-upper outer join, so we can't
yet apply clauses that need to see those outputs. We can implement
that by not adding the outer join's relid to the joinrelids produced
by the B/C join: then clauses containing such Vars won't look like
they can be pushed down to that join. Instead they'll get pushed
to the AB/C join, and to make that happen we'll have to add both
outer-join relids to the identifying relids of that join level.

BTW, there's no problem if we start from the second form of the
identity and transform to the first, because then any C references
above the join nest are already marked as requiring the nulling
actions of both of the joins, so they won't fall below the upper
join no matter which order we do the joins in.

So another way we could fix this is to run around and forcibly mark
all such Vars with the relids of both outer joins, thus producing
a parse tree that looks more like the second form of the identity.
However, we're pretty much committed already to keeping all Vars
marked according to the syntactic tree structure, so that solution
would require revisiting a lot of code (not to mention the cost of
mutating the parse tree like that). I don't want to go that way
unless really forced into it.

I have a WIP patch that does it like this, and it fixes the presented
case; but it's not complete so it's still failing some existing
regression tests. More later.

regards, tom lane

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#12)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

I wrote:

I have a WIP patch that does it like this, and it fixes the presented
case; but it's not complete so it's still failing some existing
regression tests. More later.

Here's said patch. Although this fixes the described problem and
passes check-world, I'm not totally happy with it yet: it feels
like the new add_outer_joins_to_relids() function is too expensive
to be doing every time we construct a join relation. I wonder if
there is a way we can add more info to the SpecialJoinInfo data
structure to make it cheaper. An obvious improvement is to store
commute_below_l explicitly instead of recomputing it over and over,
but that isn't going to move the needle all that far. Is there a
way to not have to scan all the SpecialJoinInfos?

I may be worrying over nothing --- it's likely that the path
construction work that will happen after we make the join relation
RelOptInfo will swamp this cost anyway. But it feels ugly.

Another thing I'm wondering about is whether this'd let us get
rid of RestrictInfo.is_pushed_down and RINFO_IS_PUSHED_DOWN.
I tried really hard to remove those in favor of checking to see
whether a qual clause's required_relids include the outer join
being formed, which seems like a much more principled way of
deciding whether it's a filter or join clause. I couldn't get
that to work, but now I wonder if what I was running into was
really bugs associated with not understanding that we haven't
fully formed the lower outer join when we apply identity 3.

regards, tom lane

Attachments:

postpone-adding-pushed-down-ojrelid-to-relids.patchtext/x-diff; charset=us-ascii; name=postpone-adding-pushed-down-ojrelid-to-relids.patchDownload+198-27
#14Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#13)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On Wed, Mar 1, 2023 at 3:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's said patch. Although this fixes the described problem and
passes check-world, I'm not totally happy with it yet: it feels
like the new add_outer_joins_to_relids() function is too expensive
to be doing every time we construct a join relation.

It seems that this change may affect how we select the appropriate
outer-join clause from redundant versions of that clause for an outer
join, because we make that decision relying on the joinrelids of the
outer join and outer joins below. If we've decided not to add the outer
join's relid to an outer join, we'd choose a clause that does not
contain that outer join's relid. As a result, we may have mismatched
nullingrels in joinqual and the join's target entry. I see this problem
in the query below.

select * from t1 left join t2 on true left join t3 on t2.x left join t4 on
t3.x;

When we build the join of t2/t3 to t4, we have two versions of the
joinqual 't3.x', one with t2/t3 join in the nullingrels, and one
without. The latter one would be chosen since we haven't added t2/t3
join's ojrelid. However, the targetlist of t2/t3 join would have the
t3 Vars marked with the join's ojrelid. So that we see the mismatched
nullingrels.

Do we need to revise how we build target list for outer join by
adjusting the nullingrels of Vars and PHVs from input_rel in a similar
way?

Thanks
Richard

#15Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#14)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On Wed, Mar 1, 2023 at 2:44 PM Richard Guo <guofenglinux@gmail.com> wrote:

Do we need to revise how we build target list for outer join by
adjusting the nullingrels of Vars and PHVs from input_rel in a similar
way?

Hmm. Doing this complicates matters even more. Maybe we can just loosen
the cross-check for nullingrels to cope with this change by using
NRM_SUBSET matches for joinquals (including mergeclauses, hashclauses
and hashkeys) in set_join_references()?

Thanks
Richard

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#15)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

Richard Guo <guofenglinux@gmail.com> writes:

Hmm. Doing this complicates matters even more. Maybe we can just loosen
the cross-check for nullingrels to cope with this change by using
NRM_SUBSET matches for joinquals (including mergeclauses, hashclauses
and hashkeys) in set_join_references()?

That would be sad ... it'd basically be conceding defeat at the task
of knowing that we've accurately placed joinquals, which is one of
the most fundamental things I wanted to get out of this rewrite.

I might accept weakening those assertions as a stopgap that we plan to
work on more later, except that I'm afraid that this is telling us
there are still bugs in the area.

What's feeling like it might be the best thing is to go ahead and
syntactically convert to the second form of identity 3 as soon as
we've determined it's applicable, so that upper C Vars are always
marked with both OJ relids. Not sure how much work is involved
there.

regards, tom lane

#17Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#16)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On Thu, Mar 2, 2023 at 11:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

What's feeling like it might be the best thing is to go ahead and
syntactically convert to the second form of identity 3 as soon as
we've determined it's applicable, so that upper C Vars are always
marked with both OJ relids. Not sure how much work is involved
there.

I'm thinking something that once we've determined identity 3 applies in
make_outerjoininfo we record information about how to adjust relids for
upper quals and store this info in all upper JoinTreeItems. Then
afterwards in distribute_qual_to_rels we check all this info stored in
current JoinTreeItem and adjust relids for the qual accordingly.

The info we record should consist of two parts, target_relids and
added_relids. Vars and PHVs in quals that belong to target_relids
should be adjusted to include added_relids.

Following this idea I come up with attached patch (no comments and test
cases yet). It fixes the presented case and passes check-world. Before
finishing it I'd like to know whether this idea works. Any comments are
appreciated.

Thanks
Richard

Attachments:

v1-0001-Draft-adjust-upper-quals.patchapplication/octet-stream; name=v1-0001-Draft-adjust-upper-quals.patchDownload+64-4
#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#17)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

[ sorry for slow response ]

Richard Guo <guofenglinux@gmail.com> writes:

On Thu, Mar 2, 2023 at 11:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

What's feeling like it might be the best thing is to go ahead and
syntactically convert to the second form of identity 3 as soon as
we've determined it's applicable, so that upper C Vars are always
marked with both OJ relids. Not sure how much work is involved
there.

I'm thinking something that once we've determined identity 3 applies in
make_outerjoininfo we record information about how to adjust relids for
upper quals and store this info in all upper JoinTreeItems. Then
afterwards in distribute_qual_to_rels we check all this info stored in
current JoinTreeItem and adjust relids for the qual accordingly.

The info we record should consist of two parts, target_relids and
added_relids. Vars and PHVs in quals that belong to target_relids
should be adjusted to include added_relids.

Following this idea I come up with attached patch (no comments and test
cases yet). It fixes the presented case and passes check-world. Before
finishing it I'd like to know whether this idea works. Any comments are
appreciated.

This doesn't look hugely promising to me. We do need to do something
like "Vars/PHVs referencing join X now need to also reference join Y",
but I think we need to actually change the individual Vars/PHVs not
just fake it by hacking the required_relids of RestrictInfos. Otherwise
we're going to get assertion failures in setrefs.c about nullingrel
sets not matching up.

Also, in addition to upper-level quals, we need to apply the same
transformation to the query's targetlist and havingQual if anyway
(and perhaps also mergeActionList, not sure).

So the idea that I had was to, once we detect that X commutes with Y,
immediately call a function that recurses through the relevant parts
of root->parse and performs the required nullingrels updates. This'd
result in multiple transformation traversals if there are several
commuting pairs of joins, but I doubt that that would really pose a
performance problem. If it does, we could imagine splitting up
deconstruct_jointree still further, so that detection of commutable
OJs happens in a pass earlier than distribute_quals_to_rels, and then
we fix up Vars/PHVs throughout the tree in one scan done between those
passes. But an extra deconstruct recursion pass isn't free either, so
I don't want to do that unless we actually see a performance problem.
Assuming we don't do that but perform this transformation during
deconstruct_jointree phase 2, it'd be too late to modify quals of
already-processed join tree nodes, but that's fine because they
couldn't contain Vars needing adjustment. (We could exploit that
in an incomplete way by passing in the address of the current
JoinExpr, and not bothering to recurse below it.) There is code
roughly like this in prepjointree.c already --- in particular, we'd
need to steal its hack of mutating join quals but not the JoinExpr and
FromExpr nodes themselves, to avoid breaking the recursion-in-progress.

I can have a go at coding this, or you can if you'd like to.

regards, tom lane

#19Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#18)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On Tue, Mar 21, 2023 at 12:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

This doesn't look hugely promising to me. We do need to do something
like "Vars/PHVs referencing join X now need to also reference join Y",
but I think we need to actually change the individual Vars/PHVs not
just fake it by hacking the required_relids of RestrictInfos. Otherwise
we're going to get assertion failures in setrefs.c about nullingrel
sets not matching up.

Agreed that we need to actually change the individual Vars/PHVs because
I find that with my patch it's not easy to fake the quals deduced from
ECs as there is no 'current JoinTreeItem' there. I'm not sure that
there is nullingrels mis-match issue though, because when forming an
outer join's target list we've ensured that the output Vars are always
marked according to the syntactic tree structure.

So the idea that I had was to, once we detect that X commutes with Y,
immediately call a function that recurses through the relevant parts
of root->parse and performs the required nullingrels updates.

So IIUC the idea is that if we are given the first form of identity 3,
we update all upper C Vars so that they are marked with both OJ relids.
It seems that this may have nullingrels mis-match issue because of how
we form outer join's target list. Consider

A leftjoin B on (Pab) leftjoin C on (Pbc) inner join D on (Pcd)

At first C Vars in Pcd is marked only with B/C join. Then we detect
that A/B join commutes with B/C join so we update C Vars in Pcd as being
marked with B/C join and A/B join. However C Vars in the target list of
joinrel A/B/C would be marked only with B/C join as that is consistent
with the syntactic tree structure. So when we process the joinqual Pcd
for C/D join in setrefs.c, I think we would have assertion failures
about nullingrels mismatch. (This is just my imagination. I haven't
implemented the codes to verify it.)

Thanks
Richard

#20Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#19)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)

On Tue, Mar 21, 2023 at 2:23 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Tue, Mar 21, 2023 at 12:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So the idea that I had was to, once we detect that X commutes with Y,
immediately call a function that recurses through the relevant parts
of root->parse and performs the required nullingrels updates.

So IIUC the idea is that if we are given the first form of identity 3,
we update all upper C Vars so that they are marked with both OJ relids.
It seems that this may have nullingrels mis-match issue because of how
we form outer join's target list.

I do have a try coding along this idea to update all upper C Vars being
marked with both OJ relids if the two OJs commute according to identity
3. And then I realize there are several other adjustments needed to
make it work.

* As I imagined before, we need to revise how we form outer join's
target list to avoid nullingrels mismatch. Currently the outer join's
target list is built in a way that the output Vars have the same nulling
bitmaps that they would if the two joins had been done in syntactic
order. And now we need to change that to always marking the output C
Vars with both joins.

* We need to revise how we check if identity 3 applies and if so how we
adjust min_lefthand in make_outerjoininfo. Consider

A leftjoin B on (Pab) leftjoin C on (Pbc) leftjoin D on (Pcd)

So now the C Vars in Pcd are marked with A/B join and B/C join since the
two joins can commute. When it comes to check if C/D join can commute
with B/C join, the A/B join from C Vars' nullingrels would make us think
that C/D join cannot commute with B/C join, which is not correct.
Besides, C/D join's min_lefthand including A/B join is also problematic.

* We need to adjust how we generate multiple versions for LEFT JOIN
quals in deconstruct_distribute_oj_quals, to cope with any new added
nulling rel. But I haven't figured out this point clearly yet.

So it seems there is some work involved. I begin to wonder if this is
the way we want.

Thanks
Richard

#21Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#14)
#22Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#21)
#23Andrei Lepikhov
lepihov@gmail.com
In reply to: Richard Guo (#22)
#24Richard Guo
guofenglinux@gmail.com
In reply to: Andrei Lepikhov (#23)
#25Andrei Lepikhov
lepihov@gmail.com
In reply to: Richard Guo (#24)
#26Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#13)
#27Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Richard Guo (#27)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#29)
#31Richard Guo
guofenglinux@gmail.com
In reply to: Robert Haas (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#32)
#34Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#32)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#34)
#36Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#37)
#39Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#38)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#39)
#41Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#41)
#43Richard Guo
guofenglinux@gmail.com
In reply to: Tom Lane (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#43)
#45Andrei Lepikhov
lepihov@gmail.com
In reply to: Tom Lane (#44)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrei Lepikhov (#45)