Do not scan index in right table if condition for left join evaluates to false using columns in left table

Started by Илья Жарковover 1 year ago15 messages
Jump to latest
#1Илья Жарков
izharkov1243@gmail.com

Hi, could you help to understand why Postgres scans index in right table in
the following case:

CREATE TABLE parent (

id integer PRIMARY KEY,
dtype text
);

CREATE TABLE child (

id integer PRIMARY KEY
);

INSERT INTO parent (id, dtype) values (1, 'A');

INSERT INTO child (id) values (1);

EXPLAIN ANALYZE

SELECT *
FROM parent p
LEFT JOIN child c
ON p.id = c.id
AND p.dtype = 'B'
WHERE p.id = 1;

Note that the only record in *parent *table has dtype == 'A', but the join
condition has p.dtype = 'B'.
The query plan still shows Index Only Scan on *child *table with loops=1.

Nested Loop Left Join (cost=0.31..16.36 rows=1 width=40) (actual

time=0.104..0.107 rows=1 loops=1)
Join Filter: (p.dtype = 'B'::text)
Rows Removed by Join Filter: 1
-> Index Scan using parent_pkey on parent p (cost=0.15..8.17 rows=1
width=36) (actual time=0.018..0.019 rows=1 loops=1)
Index Cond: (id = 1)
-> Index Only Scan using child_pkey on child c (cost=0.15..8.17
rows=1 width=4) (actual time=0.078..0.080 rows=1 loops=1)
Index Cond: (id = 1)
Heap Fetches: 1

In comparison, if using INNER JOIN, Index Only Scan on *child *table is
never executed.
Tested on PostgreSQL 17.2

#2Andres Freund
andres@anarazel.de
In reply to: Илья Жарков (#1)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

Hi,

On 2024-12-07 21:30:46 +0300, Илья Жарков wrote:

Hi, could you help to understand why Postgres scans index in right table in
the following case:

CREATE TABLE parent (

id integer PRIMARY KEY,
dtype text
);

CREATE TABLE child (

id integer PRIMARY KEY
);

INSERT INTO parent (id, dtype) values (1, 'A');

INSERT INTO child (id) values (1);

EXPLAIN ANALYZE

SELECT *
FROM parent p
LEFT JOIN child c
ON p.id = c.id
AND p.dtype = 'B'
WHERE p.id = 1;

Note that the only record in *parent *table has dtype == 'A', but the join
condition has p.dtype = 'B'.
The query plan still shows Index Only Scan on *child *table with loops=1.

The relevant difference between the inner and left join is that for the inner
join we push down the p.dtype = 'B'::text condition down. However, we do *not*
do so for outer joins.

Outer:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop Left Join │
│ Join Filter: (p.dtype = 'B'::text) │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘

Inner:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ Filter: (dtype = 'B'::text) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘

We *do* have code that recognizes the case where a clause in a join's ON only
references the nullable side. We however don't have code that recognizes the
same if it's the non-nullable side.

That's somewhat surprising, but it does kinda make sense: A after all, in a
query like yours, you could just have had the p.dtype = 'B' in the WHERE list,
rather than inside the join's ON. The same isn't true for the nullable side of
the join, as a condition for te nullable side in the WHERE clause breaks the
join's "outerness".

I.e. you can write your query as
SELECT * FROM parent p LEFT JOIN child c ON p.id = c.id WHERE p.id = 1 AND p.dtype = 'B';
in which case you get the expected query plan:
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop Left Join (cost=0.31..16.36 rows=1 width=40) (actual time=0.035..0.036 rows=0 loops=1) │
│ -> Index Scan using parent_pkey on parent p (cost=0.15..8.17 rows=1 width=36) (actual time=0.034..0.034 rows=0 loops=1) │
│ Index Cond: (id = 1) │
│ Filter: (dtype = 'B'::text) │
│ Rows Removed by Filter: 1 │
│ -> Index Only Scan using child_pkey on child c (cost=0.15..8.17 rows=1 width=4) (never executed) │
│ Index Cond: (id = 1) │
│ Heap Fetches: 0 │
│ Planning Time: 29.912 ms │
│ Execution Time: 0.095 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

ISTM that it shouldn't be expensive to recognize this type of join clause and
pushes them down. While it could be done by the query's author, it seems worth
handling this on our side. But maybe I'm missing something here?

Greetings,

Andres Freund

PS:

Here's the repro in an easier to copy way:

CREATE TABLE parent (id integer PRIMARY KEY,dtype text not null);
CREATE TABLE child (id integer PRIMARY KEY);
INSERT INTO parent (id, dtype) values (1, 'A');
INSERT INTO child (id) values (1);

EXPLAIN SELECT * FROM parent p LEFT JOIN child c ON p.id = c.id AND p.dtype = 'B' WHERE p.id = 1;
EXPLAIN SELECT * FROM parent p JOIN child c ON p.id = c.id AND p.dtype = 'B' WHERE p.id = 1;

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

Andres Freund <andres@anarazel.de> writes:

ISTM that it shouldn't be expensive to recognize this type of join clause and
pushes them down. While it could be done by the query's author, it seems worth
handling this on our side. But maybe I'm missing something here?

No, that condition *can't* be pushed down to the LHS scan, because its
failure should not remove LHS rows from the output; it can only cause
them to have nulls in the RHS columns.

One could imagine that we split up the join filter conditions into
"depends on RHS" and "doesn't depend on RHS" subsets, and make the
nestloop plan node evaluate the latter set only once per LHS row,
and then skip the inner-side scan when that condition fails.
But this would be a bunch of new mechanism that's only useful for
outer joins, only for rather hokey outer join conditions, and only
for nestloop-type joins. I'm pretty dubious that it's worth the
trouble -- not least because I don't recall anybody complaining
about this before.

regards, tom lane

#4Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#2)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

On 2024-12-07 16:37:32 -0500, Andres Freund wrote:

On 2024-12-07 21:30:46 +0300, Илья Жарков wrote:

Note that the only record in *parent *table has dtype == 'A', but the join
condition has p.dtype = 'B'.
The query plan still shows Index Only Scan on *child *table with loops=1.

The relevant difference between the inner and left join is that for the inner
join we push down the p.dtype = 'B'::text condition down. However, we do *not*
do so for outer joins.

Outer:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop Left Join │
│ Join Filter: (p.dtype = 'B'::text) │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘

Inner:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ Filter: (dtype = 'B'::text) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘

We *do* have code that recognizes the case where a clause in a join's ON only
references the nullable side. We however don't have code that recognizes the
same if it's the non-nullable side.

That's somewhat surprising, but it does kinda make sense: A after all, in a
query like yours, you could just have had the p.dtype = 'B' in the WHERE list,
rather than inside the join's ON. The same isn't true for the nullable side of
the join, as a condition for te nullable side in the WHERE clause breaks the
join's "outerness".

I.e. you can write your query as
SELECT * FROM parent p LEFT JOIN child c ON p.id = c.id WHERE p.id = 1 AND p.dtype = 'B';
in which case you get the expected query plan:
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop Left Join (cost=0.31..16.36 rows=1 width=40) (actual time=0.035..0.036 rows=0 loops=1) │
│ -> Index Scan using parent_pkey on parent p (cost=0.15..8.17 rows=1 width=36) (actual time=0.034..0.034 rows=0 loops=1) │
│ Index Cond: (id = 1) │
│ Filter: (dtype = 'B'::text) │
│ Rows Removed by Filter: 1 │
│ -> Index Only Scan using child_pkey on child c (cost=0.15..8.17 rows=1 width=4) (never executed) │
│ Index Cond: (id = 1) │
│ Heap Fetches: 0 │
│ Planning Time: 29.912 ms │
│ Execution Time: 0.095 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Ah, wait - that's not actually equivalent. When p.dtype = 'B' inthe ON clause,
we still produce an output row, but not if in the WHERE clause.

So:

ISTM that it shouldn't be expensive to recognize this type of join clause and
pushes them down. While it could be done by the query's author, it seems worth
handling this on our side. But maybe I'm missing something here?

yes, I was missing something, it would not be a valid transformation.

I guess it's still somewhat silly that we do an index scan for the inner side,
even though we could know that we'll always fail to the joinqual. We could
evaluate the qual after fetching the outer row, before fetching the matching
inner row.

It's might not be worth adding code to handle such cases to the the nested
loop code, this is probably not that common a query pattern. If we don't want
to have explict execution time code paths, we could emit a "constant
qualification" Result node above the inner side of a parametrized nested loop?

Greetings,

Andres Freund

#5Илья Жарков
izharkov1243@gmail.com
In reply to: Andres Freund (#4)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

I've just realized that I replied not to the whole mailing list. This is a
duplicate of mail sent to Andres.

Thank you for the quick response.

Some background using code and concepts of particular languages and
frameworks:

Exactly such queries are automatically built by Hibernate if using
polymorphic queries with fetching associations of sub-classes using
function treat.
https://docs.jboss.org/hibernate/orm/current/userguide/html_single/Hibernate_User_Guide.html#hql-function-treat

Reproducer:

@Entity
@Inheritance
@DiscriminatorValue("A")
class Parent {
@Id
Integer id;
}

@Entity
@DiscriminatorValue("B")
class ParentB extends Parent {
@OneToOne(mappedBy = "parent")
Child child;
}

@Entity
class Child {
@Id
Integer id;

@OneToOne
@MapsId
@JoinColumn(name = "id")
private ParentB parent;
}

List<Parent> resultList = session.createQuery(
"""
from Parent p
left join fetch treat(p as ParentB).child
where p.id = :id
""", Parent.class)
.setParameter("id", 1)
.getResultList();

вс, 8 дек. 2024 г. в 01:21, Andres Freund <andres@anarazel.de>:

Show quoted text

On 2024-12-07 16:37:32 -0500, Andres Freund wrote:

On 2024-12-07 21:30:46 +0300, Илья Жарков wrote:

Note that the only record in *parent *table has dtype == 'A', but the

join

condition has p.dtype = 'B'.
The query plan still shows Index Only Scan on *child *table with

loops=1.

The relevant difference between the inner and left join is that for the

inner

join we push down the p.dtype = 'B'::text condition down. However, we do

*not*

do so for outer joins.

Outer:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop Left Join │
│ Join Filter: (p.dtype = 'B'::text) │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘

Inner:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ Filter: (dtype = 'B'::text) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘

We *do* have code that recognizes the case where a clause in a join's ON

only

references the nullable side. We however don't have code that recognizes

the

same if it's the non-nullable side.

That's somewhat surprising, but it does kinda make sense: A after all,

in a

query like yours, you could just have had the p.dtype = 'B' in the WHERE

list,

rather than inside the join's ON. The same isn't true for the nullable

side of

the join, as a condition for te nullable side in the WHERE clause breaks

the

join's "outerness".

I.e. you can write your query as
SELECT * FROM parent p LEFT JOIN child c ON p.id = c.id WHERE p.id =

1 AND p.dtype = 'B';

in which case you get the expected query plan:

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐

│ QUERY PLAN

├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤

│ Nested Loop Left Join (cost=0.31..16.36 rows=1 width=40) (actual

time=0.035..0.036 rows=0 loops=1) │

│ -> Index Scan using parent_pkey on parent p (cost=0.15..8.17

rows=1 width=36) (actual time=0.034..0.034 rows=0 loops=1) │

│ Index Cond: (id = 1)

│ Filter: (dtype = 'B'::text)

│ Rows Removed by Filter: 1

│ -> Index Only Scan using child_pkey on child c (cost=0.15..8.17

rows=1 width=4) (never executed) │

│ Index Cond: (id = 1)

│ Heap Fetches: 0

│ Planning Time: 29.912 ms

│ Execution Time: 0.095 ms

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Ah, wait - that's not actually equivalent. When p.dtype = 'B' inthe ON
clause,
we still produce an output row, but not if in the WHERE clause.

So:

ISTM that it shouldn't be expensive to recognize this type of join

clause and

pushes them down. While it could be done by the query's author, it seems

worth

handling this on our side. But maybe I'm missing something here?

yes, I was missing something, it would not be a valid transformation.

I guess it's still somewhat silly that we do an index scan for the inner
side,
even though we could know that we'll always fail to the joinqual. We could
evaluate the qual after fetching the outer row, before fetching the
matching
inner row.

It's might not be worth adding code to handle such cases to the the nested
loop code, this is probably not that common a query pattern. If we don't
want
to have explict execution time code paths, we could emit a "constant
qualification" Result node above the inner side of a parametrized nested
loop?

Greetings,

Andres Freund

#6Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

Hi,

On 2024-12-07 17:06:52 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

ISTM that it shouldn't be expensive to recognize this type of join clause and
pushes them down. While it could be done by the query's author, it seems worth
handling this on our side. But maybe I'm missing something here?

No, that condition *can't* be pushed down to the LHS scan, because its
failure should not remove LHS rows from the output; it can only cause
them to have nulls in the RHS columns.

Yea, just sent an email saying so after realizing the same.

One could imagine that we split up the join filter conditions into
"depends on RHS" and "doesn't depend on RHS" subsets, and make the
nestloop plan node evaluate the latter set only once per LHS row,
and then skip the inner-side scan when that condition fails.
But this would be a bunch of new mechanism that's only useful for
outer joins, only for rather hokey outer join conditions, and only
for nestloop-type joins. I'm pretty dubious that it's worth the
trouble -- not least because I don't recall anybody complaining
about this before.

As I wrote in my other email, I'm also somewhat dubious it's worth having
explicit code for this in nodeNestloop.c.

Not convinced that such queries are that hokey though, it's not that rare a
thing to want to left join to some other table only if the outer side matches
some attribute. We ourselves do have a few cases like that in our queries,
e.g. psql's describeOneTableDetails(), pg_dump's getTables() and several
others. ...

Greetings,

Andres Freund

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

Andres Freund <andres@anarazel.de> writes:

On 2024-12-07 17:06:52 -0500, Tom Lane wrote:

One could imagine that we split up the join filter conditions into
"depends on RHS" and "doesn't depend on RHS" subsets, and make the
nestloop plan node evaluate the latter set only once per LHS row,
and then skip the inner-side scan when that condition fails.

As I wrote in my other email, I'm also somewhat dubious it's worth having
explicit code for this in nodeNestloop.c.

Yeah. Your idea of pushing the "doesn't depend on RHS" subset into
a one-time Result filter atop the RHS is interesting though. Then we
don't need any new executor machinery, but we pay for that with a more
complex planner patch. Not sure how hard that would be.

regards, tom lane

#8Andrei Lepikhov
lepihov@gmail.com
In reply to: Tom Lane (#7)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

On 12/8/24 06:13, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2024-12-07 17:06:52 -0500, Tom Lane wrote:

One could imagine that we split up the join filter conditions into
"depends on RHS" and "doesn't depend on RHS" subsets, and make the
nestloop plan node evaluate the latter set only once per LHS row,
and then skip the inner-side scan when that condition fails.

As I wrote in my other email, I'm also somewhat dubious it's worth having
explicit code for this in nodeNestloop.c.

Yeah. Your idea of pushing the "doesn't depend on RHS" subset into
a one-time Result filter atop the RHS is interesting though. Then we
don't need any new executor machinery, but we pay for that with a more
complex planner patch. Not sure how hard that would be.

Well, I have been working on this topic for some time. Let me share some
experiences and thoughts. They may be helpful.
I had a user request on this feature, a kind of 'semi-pushdown' clause.
As production people said, it is quite typical in sales systems to have
a table of products and additional tables describing product categories.
Something like that:

CREATE TABLE products (
id int PRIMARY KEY, payload text DEFAULT 'product',
price real DEFAULT random(), type text);
CREATE TABLE vehicles (id int PRIMARY KEY, maxspeed int DEFAULT 250);
CREATE TABLE phones (id int PRIMARY KEY, screensize real DEFAULT 6.1);

INSERT INTO products (id,type)
SELECT x,'v' FROM generate_series(1,5E4) AS x;
INSERT INTO products (id,type)
SELECT x,'p' FROM generate_series(1+5E4,1E5) AS x;
INSERT INTO vehicles (id) SELECT x FROM generate_series(1,5E4) AS x;
INSERT INTO phones (id) SELECT x FROM generate_series(1+5E4,1E5) AS x;
VACUUM ANALYZE products, vehicles, phones;

They usually need to get top-sales products from different categories
querying database with something like that:

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM products p
LEFT JOIN phones ph ON (p.id = ph.id)
LEFT JOIN vehicles v ON (p.id = v.id)
WHERE p.price > 0.9;

Users just want to optimise such queries and get description of
different products within a single query. The query they wish to execute
looks like this:

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM products p
LEFT JOIN phones ph ON (p.id = ph.id AND p.type = 'p')
LEFT JOIN vehicles v ON (p.id = v.id AND p.type = 'v')
WHERE p.price > 0.9;

The request was: why not check the RHS-only clause inside a join node
and save cycles?

To determine how difficult it could be, I wrote a prototype (very raw),
which shows how it works in action and how many subsystems we have to touch.
Also, I wonder why you think it could work with NestLoop only. I think
avoiding touching a hash table and an index under MergeJoin can also be
beneficial.

The patch and reproduction script with resulting EXPLAINs are in the
attachment. Petr Petrov is doing further work. He may provide additional
details, if any.

--
regards, Andrei Lepikhov

Attachments:

reproduction.sqlapplication/sql; name=reproduction.sqlDownload
v2-lazy-join.difftext/x-patch; charset=UTF-8; name=v2-lazy-join.diffDownload+58-5
#9Andres Freund
andres@anarazel.de
In reply to: Andrei Lepikhov (#8)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

Hi,

On 2024-12-08 09:23:42 +0700, Andrei Lepikhov wrote:

To determine how difficult it could be, I wrote a prototype (very raw),
which shows how it works in action and how many subsystems we have to touch.

Cool!

Also, I wonder why you think it could work with NestLoop only.

It doesn't seem to apply to mergejoins. For hashjoins, it doesn't seem worth a
separate evaluation of a expression - something which has noticeable overhead
on its own - given that the cost of a lookup in the hashtable isn't
particularly high. Compare that to the nestloop case where you always need an
indexscan and often will have more complicated subtrees being executed
unnecessarily on the inner side.

I think avoiding touching a hash table and an index under MergeJoin can also
be beneficial.

How would you get significant wins for mergejoins? You need to go through both
inner and outer anyway?

Greetings,

Andres Freund

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

Andres Freund <andres@anarazel.de> writes:

On 2024-12-08 09:23:42 +0700, Andrei Lepikhov wrote:

I think avoiding touching a hash table and an index under MergeJoin can also
be beneficial.

How would you get significant wins for mergejoins? You need to go through both
inner and outer anyway?

Yeah, sounds like nonsense to me too. The reason this can be a win
for nestloop is that we perform a new scan of the inner relation
for each outer row, and if we can skip an inner scan altogether
then we're ahead. But mergejoin and hashjoin only scan each input
once anyway, so I see no opportunity to save any scan work.

It's barely possible that this is interesting for hash join
because you could save scanning a hash list ... but frankly
I don't believe there could be enough win there to justify
additional mechanism. Hash join is in a world of hurt already
if there are lots of duplicate hash codes.

regards, tom lane

#11Andrei Lepikhov
lepihov@gmail.com
In reply to: Andres Freund (#9)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

On 8/12/2024 09:52, Andres Freund wrote:

I think avoiding touching a hash table and an index under MergeJoin can also
be beneficial.

How would you get significant wins for mergejoins? You need to go through both
inner and outer anyway?

In my mind, this trick can be designed for specific cases like sales
tables, as illustrated before and used by well-rounded developers. I'm
not sure that such optimisation would be profitable in general. My point
is that the sales database has lots of categories, and when requesting
product descriptions, we will not necessarily touch all the categories -
in that case, the one-sided clause could allow us to avoid scanning some
tables at all. Am I wrong?
BTW, may it be used in SEMI JOIN cases?

--
regards, Andrei Lepikhov

#12Илья Жарков
izharkov1243@gmail.com
In reply to: Andrei Lepikhov (#11)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

Regarding merge joins, I suppose in some edge cases inner set scan might
not even be started.

FROM parent p
LEFT JOIN child c
ON p.id = c.id
AND p.dtype = 'B'

┌───┐ ┌───┐
parent │1,A│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^

If p.dtype = 'B' was evaluated early, the pointer could move through the
outer set as long as it is evaluated to false.
In the above example, it reaches the end without even accessing the inner
set.

┌───┐ ┌───┐
parent │1,A│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^

In the opposite scenario:

┌───┐ ┌───┐
parent │1,B│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^

it would need to start moving the inner pointer at some point.

┌───┐ ┌───┐
parent │1,B│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^

But even in this case, the pointer may not reach the end in the inner set.

Though this highly depends on how merge join is implemented in the Postgres
code. I have to admit that I have a very vague idea on this...

вс, 8 дек. 2024 г. в 11:44, Andrei Lepikhov <lepihov@gmail.com>:

Show quoted text

On 8/12/2024 09:52, Andres Freund wrote:

I think avoiding touching a hash table and an index under MergeJoin can

also

be beneficial.

How would you get significant wins for mergejoins? You need to go

through both

inner and outer anyway?

In my mind, this trick can be designed for specific cases like sales
tables, as illustrated before and used by well-rounded developers. I'm
not sure that such optimisation would be profitable in general. My point
is that the sales database has lots of categories, and when requesting
product descriptions, we will not necessarily touch all the categories -
in that case, the one-sided clause could allow us to avoid scanning some
tables at all. Am I wrong?
BTW, may it be used in SEMI JOIN cases?

--
regards, Andrei Lepikhov

#13Andres Freund
andres@anarazel.de
In reply to: Andrei Lepikhov (#11)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

On 2024-12-08 15:44:23 +0700, Andrei Lepikhov wrote:

On 8/12/2024 09:52, Andres Freund wrote:

I think avoiding touching a hash table and an index under MergeJoin can also
be beneficial.

How would you get significant wins for mergejoins? You need to go through both
inner and outer anyway?

In my mind, this trick can be designed for specific cases like sales tables,
as illustrated before and used by well-rounded developers.

I'm not saying that the optimization isn't useful, just that i don't think it
makes much sense for mergeoins.

Besides, at least as far as I can see, fundamentally not optimizing mergejoins
in any substantial manner, it also just doesn't seem likely that queries where
this optimization would come up are likely to be planned as mergejoins. If you
have a leftjoin to a category-style table, you're very rarely going to scan
the "main" table with an ordered index scan on the category key, which would
be required for a merge join. And sorting the main table once for each
to-be-joined-category-table isn't a desirable path most of the time either.

I don't know what it means that it's to be "used by well-rounded
developers". We have to write the implementation in way it works regardless of
what kind of developer is using postgres.

I'm not sure that such optimisation would be profitable in general.

Are you suggesting it would only be enabled by a GUC?

My point is that the sales database has lots of categories, and when
requesting product descriptions, we will not necessarily touch all the
categories - in that case, the one-sided clause could allow us to avoid
scanning some tables at all. Am I wrong?

No, I don't think you're wrong. I just don't think it's worthwhile for
mergejoins, because I don't see relevant query patterns where it would
help.

BTW, may it be used in SEMI JOIN cases?

Seems possible.

Greetings,

Andres Freund

#14Andrei Lepikhov
lepihov@gmail.com
In reply to: Andres Freund (#13)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

On 8/12/2024 23:37, Andres Freund wrote:

On 2024-12-08 15:44:23 +0700, Andrei Lepikhov wrote:

On 8/12/2024 09:52, Andres Freund wrote:

I think avoiding touching a hash table and an index under MergeJoin can also
be beneficial.

How would you get significant wins for mergejoins? You need to go through both
inner and outer anyway?

In my mind, this trick can be designed for specific cases like sales tables,
as illustrated before and used by well-rounded developers.

I'm not saying that the optimization isn't useful, just that i don't think it
makes much sense for mergeoins.

Besides, at least as far as I can see, fundamentally not optimizing mergejoins
in any substantial manner, it also just doesn't seem likely that queries where
this optimization would come up are likely to be planned as mergejoins. If you
have a leftjoin to a category-style table, you're very rarely going to scan
the "main" table with an ordered index scan on the category key, which would
be required for a merge join. And sorting the main table once for each
to-be-joined-category-table isn't a desirable path most of the time either.

I don't know what it means that it's to be "used by well-rounded
developers". We have to write the implementation in way it works regardless of
what kind of developer is using postgres.

I got it, thanks. I agree that the profit for MJ & HJ cases looks
modest. When designing the prototype, I realised that it seemed more
"symmetrical" and logical to separate such 'one-sided' clauses during
the planning for all types of join.
But I don't insist on this statement. It would be OK even in the case of
NestLoop (remember, we should change the NL cost model too).

I'm not sure that such optimisation would be profitable in general.

Are you suggesting it would only be enabled by a GUC?

No, such a feature probably won't generate much overhead; I don't see
any reason to cover it under a GUC.

--
regards, Andrei Lepikhov

#15eng eng
pspetrov91@gmail.com
In reply to: Andrei Lepikhov (#8)
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

пн, 16 дек. 2024 г. в 12:52, Andrei Lepikhov <lepihov@gmail.com>:

On 12/8/24 06:13, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2024-12-07 17:06:52 -0500, Tom Lane wrote:

One could imagine that we split up the join filter conditions into
"depends on RHS" and "doesn't depend on RHS" subsets, and make the
nestloop plan node evaluate the latter set only once per LHS row,
and then skip the inner-side scan when that condition fails.

As I wrote in my other email, I'm also somewhat dubious it's worth

having

explicit code for this in nodeNestloop.c.

Yeah. Your idea of pushing the "doesn't depend on RHS" subset into
a one-time Result filter atop the RHS is interesting though. Then we
don't need any new executor machinery, but we pay for that with a more
complex planner patch. Not sure how hard that would be.

Well, I have been working on this topic for some time. Let me share some
experiences and thoughts. They may be helpful.
I had a user request on this feature, a kind of 'semi-pushdown' clause.
As production people said, it is quite typical in sales systems to have
a table of products and additional tables describing product categories.
Something like that:

CREATE TABLE products (
id int PRIMARY KEY, payload text DEFAULT 'product',
price real DEFAULT random(), type text);
CREATE TABLE vehicles (id int PRIMARY KEY, maxspeed int DEFAULT 250);
CREATE TABLE phones (id int PRIMARY KEY, screensize real DEFAULT 6.1);

INSERT INTO products (id,type)
SELECT x,'v' FROM generate_series(1,5E4) AS x;
INSERT INTO products (id,type)
SELECT x,'p' FROM generate_series(1+5E4,1E5) AS x;
INSERT INTO vehicles (id) SELECT x FROM generate_series(1,5E4) AS x;
INSERT INTO phones (id) SELECT x FROM generate_series(1+5E4,1E5) AS x;
VACUUM ANALYZE products, vehicles, phones;

They usually need to get top-sales products from different categories
querying database with something like that:

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM products p
LEFT JOIN phones ph ON (p.id = ph.id)
LEFT JOIN vehicles v ON (p.id = v.id)
WHERE p.price > 0.9;

Users just want to optimise such queries and get description of
different products within a single query. The query they wish to execute
looks like this:

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM products p
LEFT JOIN phones ph ON (p.id = ph.id AND p.type = 'p')
LEFT JOIN vehicles v ON (p.id = v.id AND p.type = 'v')
WHERE p.price > 0.9;

The request was: why not check the RHS-only clause inside a join node
and save cycles?

To determine how difficult it could be, I wrote a prototype (very raw),
which shows how it works in action and how many subsystems we have to
touch.
Also, I wonder why you think it could work with NestLoop only. I think
avoiding touching a hash table and an index under MergeJoin can also be
beneficial.

The patch and reproduction script with resulting EXPLAINs are in the
attachment. Petr Petrov is doing further work. He may provide additional
details, if any.

--
regards, Andrei Lepikhov

Hello!

I would like to describe the main idea:

1) We would like to speed up Nested Loop Left Join execution since getting
the next inner tuple could be expensive.

Also, not all outer tuples should be matched. For example, we have 200k
rows and inner tuples are extracted only for 100 of them.

That could allow us to execute Nested Loop Left Join faster.

2) When we analyze RestrictInfo's clauses it is possible to detect whether
it's sufficient to use outer tuple data to calculate them.

If it is the case, then we save them in rhs_joinrinfo. In the current
patch version that was done in create_nestloop_path().

Then in make_nestloop() rhs_joinqual clauses were removed from
joinqual.

Then execute rhs_joinqual in ExecNestLoop() before the next inner tuple
extraction.

If the result of their evaluation is false then we fill inner tuple's
fields with nulls and return the result without visiting the inner
relation.

3) I also propose to add a new type of filter: Outer Tuple Filter.

It's a filter which is evaluated after getting the outer tuple but
before extracting the inner tuple.

Join filter is evaluated after grabbing the inner tuple, that's why I
suggest to differentiate them.

To calculate unmatched tuples by Outer Tuple Filter an additional
counter was added to Instrumentation struct: int nunmatched.

You can find the execution plan in reproduction.sql

The patch was based on commit 3191eccd8a9bff1715f2e4fab86d2932a556185e

At least, make check-world has passed.

That's my first experience in making patches, so apologize if I
misunderstand or explain something poorly.

Looking forward to your questions and feedback.

---
Best regards,
Peter Petrov

Attachments:

reproduction.sqlapplication/sql; name=reproduction.sqlDownload
v3-lazy-join.difftext/x-patch; charset=US-ASCII; name=v3-lazy-join.diffDownload+150-61