Do not scan index in right table if condition for left join evaluates to false using columns in left table
Hi, could you help to understand why Postgres scans index in right table in
the following case:
CREATE TABLE parent (
id integer PRIMARY KEY,
dtype text
);
CREATE TABLE child (
id integer PRIMARY KEY
);
INSERT INTO parent (id, dtype) values (1, 'A');
INSERT INTO child (id) values (1);
EXPLAIN ANALYZE
SELECT *
FROM parent p
LEFT JOIN child c
ON p.id = c.id
AND p.dtype = 'B'
WHERE p.id = 1;
Note that the only record in *parent *table has dtype == 'A', but the join
condition has p.dtype = 'B'.
The query plan still shows Index Only Scan on *child *table with loops=1.
Nested Loop Left Join (cost=0.31..16.36 rows=1 width=40) (actual
time=0.104..0.107 rows=1 loops=1)
Join Filter: (p.dtype = 'B'::text)
Rows Removed by Join Filter: 1
-> Index Scan using parent_pkey on parent p (cost=0.15..8.17 rows=1
width=36) (actual time=0.018..0.019 rows=1 loops=1)
Index Cond: (id = 1)
-> Index Only Scan using child_pkey on child c (cost=0.15..8.17
rows=1 width=4) (actual time=0.078..0.080 rows=1 loops=1)
Index Cond: (id = 1)
Heap Fetches: 1
In comparison, if using INNER JOIN, Index Only Scan on *child *table is
never executed.
Tested on PostgreSQL 17.2
Hi,
On 2024-12-07 21:30:46 +0300, Илья Жарков wrote:
Hi, could you help to understand why Postgres scans index in right table in
the following case:CREATE TABLE parent (
id integer PRIMARY KEY,
dtype text
);CREATE TABLE child (
id integer PRIMARY KEY
);INSERT INTO parent (id, dtype) values (1, 'A');
INSERT INTO child (id) values (1);
EXPLAIN ANALYZE
SELECT *
FROM parent p
LEFT JOIN child c
ON p.id = c.id
AND p.dtype = 'B'
WHERE p.id = 1;Note that the only record in *parent *table has dtype == 'A', but the join
condition has p.dtype = 'B'.
The query plan still shows Index Only Scan on *child *table with loops=1.
The relevant difference between the inner and left join is that for the inner
join we push down the p.dtype = 'B'::text condition down. However, we do *not*
do so for outer joins.
Outer:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop Left Join │
│ Join Filter: (p.dtype = 'B'::text) │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘
Inner:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ Filter: (dtype = 'B'::text) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘
We *do* have code that recognizes the case where a clause in a join's ON only
references the nullable side. We however don't have code that recognizes the
same if it's the non-nullable side.
That's somewhat surprising, but it does kinda make sense: A after all, in a
query like yours, you could just have had the p.dtype = 'B' in the WHERE list,
rather than inside the join's ON. The same isn't true for the nullable side of
the join, as a condition for te nullable side in the WHERE clause breaks the
join's "outerness".
I.e. you can write your query as
SELECT * FROM parent p LEFT JOIN child c ON p.id = c.id WHERE p.id = 1 AND p.dtype = 'B';
in which case you get the expected query plan:
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop Left Join (cost=0.31..16.36 rows=1 width=40) (actual time=0.035..0.036 rows=0 loops=1) │
│ -> Index Scan using parent_pkey on parent p (cost=0.15..8.17 rows=1 width=36) (actual time=0.034..0.034 rows=0 loops=1) │
│ Index Cond: (id = 1) │
│ Filter: (dtype = 'B'::text) │
│ Rows Removed by Filter: 1 │
│ -> Index Only Scan using child_pkey on child c (cost=0.15..8.17 rows=1 width=4) (never executed) │
│ Index Cond: (id = 1) │
│ Heap Fetches: 0 │
│ Planning Time: 29.912 ms │
│ Execution Time: 0.095 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
ISTM that it shouldn't be expensive to recognize this type of join clause and
pushes them down. While it could be done by the query's author, it seems worth
handling this on our side. But maybe I'm missing something here?
Greetings,
Andres Freund
PS:
Here's the repro in an easier to copy way:
CREATE TABLE parent (id integer PRIMARY KEY,dtype text not null);
CREATE TABLE child (id integer PRIMARY KEY);
INSERT INTO parent (id, dtype) values (1, 'A');
INSERT INTO child (id) values (1);
EXPLAIN SELECT * FROM parent p LEFT JOIN child c ON p.id = c.id AND p.dtype = 'B' WHERE p.id = 1;
EXPLAIN SELECT * FROM parent p JOIN child c ON p.id = c.id AND p.dtype = 'B' WHERE p.id = 1;
Andres Freund <andres@anarazel.de> writes:
ISTM that it shouldn't be expensive to recognize this type of join clause and
pushes them down. While it could be done by the query's author, it seems worth
handling this on our side. But maybe I'm missing something here?
No, that condition *can't* be pushed down to the LHS scan, because its
failure should not remove LHS rows from the output; it can only cause
them to have nulls in the RHS columns.
One could imagine that we split up the join filter conditions into
"depends on RHS" and "doesn't depend on RHS" subsets, and make the
nestloop plan node evaluate the latter set only once per LHS row,
and then skip the inner-side scan when that condition fails.
But this would be a bunch of new mechanism that's only useful for
outer joins, only for rather hokey outer join conditions, and only
for nestloop-type joins. I'm pretty dubious that it's worth the
trouble -- not least because I don't recall anybody complaining
about this before.
regards, tom lane
On 2024-12-07 16:37:32 -0500, Andres Freund wrote:
On 2024-12-07 21:30:46 +0300, Илья Жарков wrote:
Note that the only record in *parent *table has dtype == 'A', but the join
condition has p.dtype = 'B'.
The query plan still shows Index Only Scan on *child *table with loops=1.The relevant difference between the inner and left join is that for the inner
join we push down the p.dtype = 'B'::text condition down. However, we do *not*
do so for outer joins.Outer:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop Left Join │
│ Join Filter: (p.dtype = 'B'::text) │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘Inner:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ Filter: (dtype = 'B'::text) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘We *do* have code that recognizes the case where a clause in a join's ON only
references the nullable side. We however don't have code that recognizes the
same if it's the non-nullable side.That's somewhat surprising, but it does kinda make sense: A after all, in a
query like yours, you could just have had the p.dtype = 'B' in the WHERE list,
rather than inside the join's ON. The same isn't true for the nullable side of
the join, as a condition for te nullable side in the WHERE clause breaks the
join's "outerness".I.e. you can write your query as
SELECT * FROM parent p LEFT JOIN child c ON p.id = c.id WHERE p.id = 1 AND p.dtype = 'B';
in which case you get the expected query plan:
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop Left Join (cost=0.31..16.36 rows=1 width=40) (actual time=0.035..0.036 rows=0 loops=1) │
│ -> Index Scan using parent_pkey on parent p (cost=0.15..8.17 rows=1 width=36) (actual time=0.034..0.034 rows=0 loops=1) │
│ Index Cond: (id = 1) │
│ Filter: (dtype = 'B'::text) │
│ Rows Removed by Filter: 1 │
│ -> Index Only Scan using child_pkey on child c (cost=0.15..8.17 rows=1 width=4) (never executed) │
│ Index Cond: (id = 1) │
│ Heap Fetches: 0 │
│ Planning Time: 29.912 ms │
│ Execution Time: 0.095 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Ah, wait - that's not actually equivalent. When p.dtype = 'B' inthe ON clause,
we still produce an output row, but not if in the WHERE clause.
So:
ISTM that it shouldn't be expensive to recognize this type of join clause and
pushes them down. While it could be done by the query's author, it seems worth
handling this on our side. But maybe I'm missing something here?
yes, I was missing something, it would not be a valid transformation.
I guess it's still somewhat silly that we do an index scan for the inner side,
even though we could know that we'll always fail to the joinqual. We could
evaluate the qual after fetching the outer row, before fetching the matching
inner row.
It's might not be worth adding code to handle such cases to the the nested
loop code, this is probably not that common a query pattern. If we don't want
to have explict execution time code paths, we could emit a "constant
qualification" Result node above the inner side of a parametrized nested loop?
Greetings,
Andres Freund
I've just realized that I replied not to the whole mailing list. This is a
duplicate of mail sent to Andres.
Thank you for the quick response.
Some background using code and concepts of particular languages and
frameworks:
Exactly such queries are automatically built by Hibernate if using
polymorphic queries with fetching associations of sub-classes using
function treat.
https://docs.jboss.org/hibernate/orm/current/userguide/html_single/Hibernate_User_Guide.html#hql-function-treat
Reproducer:
@Entity
@Inheritance
@DiscriminatorValue("A")
class Parent {
@Id
Integer id;
}
@Entity
@DiscriminatorValue("B")
class ParentB extends Parent {
@OneToOne(mappedBy = "parent")
Child child;
}
@Entity
class Child {
@Id
Integer id;
@OneToOne
@MapsId
@JoinColumn(name = "id")
private ParentB parent;
}
List<Parent> resultList = session.createQuery(
"""
from Parent p
left join fetch treat(p as ParentB).child
where p.id = :id
""", Parent.class)
.setParameter("id", 1)
.getResultList();
вс, 8 дек. 2024 г. в 01:21, Andres Freund <andres@anarazel.de>:
Show quoted text
On 2024-12-07 16:37:32 -0500, Andres Freund wrote:
On 2024-12-07 21:30:46 +0300, Илья Жарков wrote:
Note that the only record in *parent *table has dtype == 'A', but the
join
condition has p.dtype = 'B'.
The query plan still shows Index Only Scan on *child *table withloops=1.
The relevant difference between the inner and left join is that for the
inner
join we push down the p.dtype = 'B'::text condition down. However, we do
*not*
do so for outer joins.
Outer:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop Left Join │
│ Join Filter: (p.dtype = 'B'::text) │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘Inner:
┌───────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────┤
│ Nested Loop │
│ -> Index Scan using parent_pkey on parent p │
│ Index Cond: (id = 1) │
│ Filter: (dtype = 'B'::text) │
│ -> Index Only Scan using child_pkey on child c │
│ Index Cond: (id = 1) │
└───────────────────────────────────────────────────┘We *do* have code that recognizes the case where a clause in a join's ON
only
references the nullable side. We however don't have code that recognizes
the
same if it's the non-nullable side.
That's somewhat surprising, but it does kinda make sense: A after all,
in a
query like yours, you could just have had the p.dtype = 'B' in the WHERE
list,
rather than inside the join's ON. The same isn't true for the nullable
side of
the join, as a condition for te nullable side in the WHERE clause breaks
the
join's "outerness".
I.e. you can write your query as
SELECT * FROM parent p LEFT JOIN child c ON p.id = c.id WHERE p.id =1 AND p.dtype = 'B';
in which case you get the expected query plan:
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop Left Join (cost=0.31..16.36 rows=1 width=40) (actual
time=0.035..0.036 rows=0 loops=1) │
│ -> Index Scan using parent_pkey on parent p (cost=0.15..8.17
rows=1 width=36) (actual time=0.034..0.034 rows=0 loops=1) │
│ Index Cond: (id = 1)
│
│ Filter: (dtype = 'B'::text)
│
│ Rows Removed by Filter: 1
│
│ -> Index Only Scan using child_pkey on child c (cost=0.15..8.17
rows=1 width=4) (never executed) │
│ Index Cond: (id = 1)
│
│ Heap Fetches: 0
│
│ Planning Time: 29.912 ms
│
│ Execution Time: 0.095 ms
│
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Ah, wait - that's not actually equivalent. When p.dtype = 'B' inthe ON
clause,
we still produce an output row, but not if in the WHERE clause.So:
ISTM that it shouldn't be expensive to recognize this type of join
clause and
pushes them down. While it could be done by the query's author, it seems
worth
handling this on our side. But maybe I'm missing something here?
yes, I was missing something, it would not be a valid transformation.
I guess it's still somewhat silly that we do an index scan for the inner
side,
even though we could know that we'll always fail to the joinqual. We could
evaluate the qual after fetching the outer row, before fetching the
matching
inner row.It's might not be worth adding code to handle such cases to the the nested
loop code, this is probably not that common a query pattern. If we don't
want
to have explict execution time code paths, we could emit a "constant
qualification" Result node above the inner side of a parametrized nested
loop?Greetings,
Andres Freund
Hi,
On 2024-12-07 17:06:52 -0500, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
ISTM that it shouldn't be expensive to recognize this type of join clause and
pushes them down. While it could be done by the query's author, it seems worth
handling this on our side. But maybe I'm missing something here?No, that condition *can't* be pushed down to the LHS scan, because its
failure should not remove LHS rows from the output; it can only cause
them to have nulls in the RHS columns.
Yea, just sent an email saying so after realizing the same.
One could imagine that we split up the join filter conditions into
"depends on RHS" and "doesn't depend on RHS" subsets, and make the
nestloop plan node evaluate the latter set only once per LHS row,
and then skip the inner-side scan when that condition fails.
But this would be a bunch of new mechanism that's only useful for
outer joins, only for rather hokey outer join conditions, and only
for nestloop-type joins. I'm pretty dubious that it's worth the
trouble -- not least because I don't recall anybody complaining
about this before.
As I wrote in my other email, I'm also somewhat dubious it's worth having
explicit code for this in nodeNestloop.c.
Not convinced that such queries are that hokey though, it's not that rare a
thing to want to left join to some other table only if the outer side matches
some attribute. We ourselves do have a few cases like that in our queries,
e.g. psql's describeOneTableDetails(), pg_dump's getTables() and several
others. ...
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2024-12-07 17:06:52 -0500, Tom Lane wrote:
One could imagine that we split up the join filter conditions into
"depends on RHS" and "doesn't depend on RHS" subsets, and make the
nestloop plan node evaluate the latter set only once per LHS row,
and then skip the inner-side scan when that condition fails.
As I wrote in my other email, I'm also somewhat dubious it's worth having
explicit code for this in nodeNestloop.c.
Yeah. Your idea of pushing the "doesn't depend on RHS" subset into
a one-time Result filter atop the RHS is interesting though. Then we
don't need any new executor machinery, but we pay for that with a more
complex planner patch. Not sure how hard that would be.
regards, tom lane
On 12/8/24 06:13, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2024-12-07 17:06:52 -0500, Tom Lane wrote:
One could imagine that we split up the join filter conditions into
"depends on RHS" and "doesn't depend on RHS" subsets, and make the
nestloop plan node evaluate the latter set only once per LHS row,
and then skip the inner-side scan when that condition fails.As I wrote in my other email, I'm also somewhat dubious it's worth having
explicit code for this in nodeNestloop.c.Yeah. Your idea of pushing the "doesn't depend on RHS" subset into
a one-time Result filter atop the RHS is interesting though. Then we
don't need any new executor machinery, but we pay for that with a more
complex planner patch. Not sure how hard that would be.
Well, I have been working on this topic for some time. Let me share some
experiences and thoughts. They may be helpful.
I had a user request on this feature, a kind of 'semi-pushdown' clause.
As production people said, it is quite typical in sales systems to have
a table of products and additional tables describing product categories.
Something like that:
CREATE TABLE products (
id int PRIMARY KEY, payload text DEFAULT 'product',
price real DEFAULT random(), type text);
CREATE TABLE vehicles (id int PRIMARY KEY, maxspeed int DEFAULT 250);
CREATE TABLE phones (id int PRIMARY KEY, screensize real DEFAULT 6.1);
INSERT INTO products (id,type)
SELECT x,'v' FROM generate_series(1,5E4) AS x;
INSERT INTO products (id,type)
SELECT x,'p' FROM generate_series(1+5E4,1E5) AS x;
INSERT INTO vehicles (id) SELECT x FROM generate_series(1,5E4) AS x;
INSERT INTO phones (id) SELECT x FROM generate_series(1+5E4,1E5) AS x;
VACUUM ANALYZE products, vehicles, phones;
They usually need to get top-sales products from different categories
querying database with something like that:
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM products p
LEFT JOIN phones ph ON (p.id = ph.id)
LEFT JOIN vehicles v ON (p.id = v.id)
WHERE p.price > 0.9;
Users just want to optimise such queries and get description of
different products within a single query. The query they wish to execute
looks like this:
EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM products p
LEFT JOIN phones ph ON (p.id = ph.id AND p.type = 'p')
LEFT JOIN vehicles v ON (p.id = v.id AND p.type = 'v')
WHERE p.price > 0.9;
The request was: why not check the RHS-only clause inside a join node
and save cycles?
To determine how difficult it could be, I wrote a prototype (very raw),
which shows how it works in action and how many subsystems we have to touch.
Also, I wonder why you think it could work with NestLoop only. I think
avoiding touching a hash table and an index under MergeJoin can also be
beneficial.
The patch and reproduction script with resulting EXPLAINs are in the
attachment. Petr Petrov is doing further work. He may provide additional
details, if any.
--
regards, Andrei Lepikhov
Attachments:
v2-lazy-join.difftext/x-patch; charset=UTF-8; name=v2-lazy-join.diffDownload
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 7f4bf6c4db..fd80dc2192 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -66,6 +66,7 @@ ExecNestLoop(PlanState *pstate)
TupleTableSlot *outerTupleSlot;
TupleTableSlot *innerTupleSlot;
ExprState *joinqual;
+ ExprState *rhs_joinqual;
ExprState *otherqual;
ExprContext *econtext;
ListCell *lc;
@@ -79,6 +80,7 @@ ExecNestLoop(PlanState *pstate)
nl = (NestLoop *) node->js.ps.plan;
joinqual = node->js.joinqual;
+ rhs_joinqual = node->js.rhs_joinqual;
otherqual = node->js.ps.qual;
outerPlan = outerPlanState(node);
innerPlan = innerPlanState(node);
@@ -156,8 +158,15 @@ ExecNestLoop(PlanState *pstate)
*/
ENL1_printf("getting new inner tuple");
- innerTupleSlot = ExecProcNode(innerPlan);
- econtext->ecxt_innertuple = innerTupleSlot;
+ if (rhs_joinqual && !ExecQual(rhs_joinqual, econtext))
+ {
+ innerTupleSlot = NULL;
+ }
+ else
+ {
+ innerTupleSlot = ExecProcNode(innerPlan);
+ econtext->ecxt_innertuple = innerTupleSlot;
+ }
if (TupIsNull(innerTupleSlot))
{
@@ -314,6 +323,8 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
nlstate->js.jointype = node->join.jointype;
nlstate->js.joinqual =
ExecInitQual(node->join.joinqual, (PlanState *) nlstate);
+ nlstate->js.rhs_joinqual =
+ ExecInitQual(node->join.rhs_joinqual, (PlanState *) nlstate);
/*
* detect whether we need only consider the first matching inner tuple
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f2ed0d81f6..003f949547 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -232,7 +232,7 @@ static RecursiveUnion *make_recursive_union(List *tlist,
static BitmapAnd *make_bitmap_and(List *bitmapplans);
static BitmapOr *make_bitmap_or(List *bitmapplans);
static NestLoop *make_nestloop(List *tlist,
- List *joinclauses, List *otherclauses, List *nestParams,
+ List *joinclauses, List *rhs_joinclauses, List *otherclauses, List *nestParams,
Plan *lefttree, Plan *righttree,
JoinType jointype, bool inner_unique);
static HashJoin *make_hashjoin(List *tlist,
@@ -4352,6 +4352,7 @@ create_nestloop_plan(PlannerInfo *root,
Plan *inner_plan;
List *tlist = build_path_tlist(root, &best_path->jpath.path);
List *joinrestrictclauses = best_path->jpath.joinrestrictinfo;
+ List *rhs_joinclauses = NIL;
List *joinclauses;
List *otherclauses;
Relids outerrelids;
@@ -4390,6 +4391,11 @@ create_nestloop_plan(PlannerInfo *root,
/* Sort join qual clauses into best execution order */
joinrestrictclauses = order_qual_clauses(root, joinrestrictclauses);
+ foreach_node(RestrictInfo, rinfo, best_path->jpath.rhs_joinrinfo)
+ {
+ rhs_joinclauses = lappend(rhs_joinclauses, rinfo->clause);
+ }
+
/* Get the join qual clauses (in plain expression form) */
/* Any pseudoconstant clauses are ignored here */
if (IS_OUTER_JOIN(best_path->jpath.jointype))
@@ -4423,6 +4429,7 @@ create_nestloop_plan(PlannerInfo *root,
join_plan = make_nestloop(tlist,
joinclauses,
+ rhs_joinclauses,
otherclauses,
nestParams,
outer_plan,
@@ -6021,6 +6028,7 @@ make_bitmap_or(List *bitmapplans)
static NestLoop *
make_nestloop(List *tlist,
List *joinclauses,
+ List *rhs_joinclauses,
List *otherclauses,
List *nestParams,
Plan *lefttree,
@@ -6038,6 +6046,23 @@ make_nestloop(List *tlist,
node->join.jointype = jointype;
node->join.inner_unique = inner_unique;
node->join.joinqual = joinclauses;
+
+ {
+ ListCell *lc;
+
+ foreach(lc, node->join.joinqual)
+ {
+ Node *qual = (Node *) lfirst(lc);
+
+ if (!list_member(rhs_joinclauses, qual))
+ continue;
+
+ node->join.joinqual = foreach_delete_current(node->join.joinqual, lc);
+ }
+
+ }
+
+ node->join.rhs_joinqual = rhs_joinclauses;
node->nestParams = nestParams;
return node;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 91c7c4fe2f..0f54450ead 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -2294,6 +2294,14 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset)
rtoffset,
NRM_EQUAL,
NUM_EXEC_QUAL((Plan *) join));
+ join->rhs_joinqual = fix_join_expr(root,
+ join->rhs_joinqual,
+ outer_itlist,
+ inner_itlist,
+ (Index) 0,
+ rtoffset,
+ NRM_EQUAL,
+ NUM_EXEC_QUAL((Plan *) join));
/* Now do join-type-specific stuff */
if (IsA(join, NestLoop))
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee2..b7f1dbac23 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2603,6 +2603,13 @@ create_nestloop_path(PlannerInfo *root,
pathnode->jpath.innerjoinpath = inner_path;
pathnode->jpath.joinrestrictinfo = restrict_clauses;
+ foreach_node(RestrictInfo, rinfo, restrict_clauses)
+ {
+ if (bms_is_subset(rinfo->clause_relids, outer_path->parent->relids))
+ pathnode->jpath.rhs_joinrinfo =
+ lappend(pathnode->jpath.rhs_joinrinfo, rinfo);
+ }
+
final_cost_nestloop(root, pathnode, workspace, extra);
return pathnode;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 182a6956bb..14c7610c8c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2125,6 +2125,7 @@ typedef struct JoinState
bool single_match; /* True if we should skip to next outer tuple
* after finding one inner match */
ExprState *joinqual; /* JOIN quals (in addition to ps.qual) */
+ ExprState *rhs_joinqual;
} JoinState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index add0f9e45f..c28c23c983 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2085,7 +2085,7 @@ typedef struct JoinPath
Path *innerjoinpath; /* path for the inner side of the join */
List *joinrestrictinfo; /* RestrictInfos to apply to join */
-
+ List *rhs_joinrinfo; /* Outer-side join clauses (filters) */
/*
* See the notes for RelOptInfo and ParamPathInfo to understand why
* joinrestrictinfo is needed in JoinPath, and can't be merged into the
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 52f29bcdb6..2c90a63e59 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -791,7 +791,8 @@ typedef struct Join
Plan plan;
JoinType jointype;
bool inner_unique;
- List *joinqual; /* JOIN quals (in addition to plan.qual) */
+ List *joinqual;
+ List *rhs_joinqual; /* JOIN quals (in addition to plan.qual) */
} Join;
/* ----------------
Hi,
On 2024-12-08 09:23:42 +0700, Andrei Lepikhov wrote:
To determine how difficult it could be, I wrote a prototype (very raw),
which shows how it works in action and how many subsystems we have to touch.
Cool!
Also, I wonder why you think it could work with NestLoop only.
It doesn't seem to apply to mergejoins. For hashjoins, it doesn't seem worth a
separate evaluation of a expression - something which has noticeable overhead
on its own - given that the cost of a lookup in the hashtable isn't
particularly high. Compare that to the nestloop case where you always need an
indexscan and often will have more complicated subtrees being executed
unnecessarily on the inner side.
I think avoiding touching a hash table and an index under MergeJoin can also
be beneficial.
How would you get significant wins for mergejoins? You need to go through both
inner and outer anyway?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2024-12-08 09:23:42 +0700, Andrei Lepikhov wrote:
I think avoiding touching a hash table and an index under MergeJoin can also
be beneficial.
How would you get significant wins for mergejoins? You need to go through both
inner and outer anyway?
Yeah, sounds like nonsense to me too. The reason this can be a win
for nestloop is that we perform a new scan of the inner relation
for each outer row, and if we can skip an inner scan altogether
then we're ahead. But mergejoin and hashjoin only scan each input
once anyway, so I see no opportunity to save any scan work.
It's barely possible that this is interesting for hash join
because you could save scanning a hash list ... but frankly
I don't believe there could be enough win there to justify
additional mechanism. Hash join is in a world of hurt already
if there are lots of duplicate hash codes.
regards, tom lane
On 8/12/2024 09:52, Andres Freund wrote:
I think avoiding touching a hash table and an index under MergeJoin can also
be beneficial.How would you get significant wins for mergejoins? You need to go through both
inner and outer anyway?
In my mind, this trick can be designed for specific cases like sales
tables, as illustrated before and used by well-rounded developers. I'm
not sure that such optimisation would be profitable in general. My point
is that the sales database has lots of categories, and when requesting
product descriptions, we will not necessarily touch all the categories -
in that case, the one-sided clause could allow us to avoid scanning some
tables at all. Am I wrong?
BTW, may it be used in SEMI JOIN cases?
--
regards, Andrei Lepikhov
Regarding merge joins, I suppose in some edge cases inner set scan might
not even be started.
FROM parent p
LEFT JOIN child c
ON p.id = c.id
AND p.dtype = 'B'
┌───┐ ┌───┐
parent │1,A│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^
If p.dtype = 'B' was evaluated early, the pointer could move through the
outer set as long as it is evaluated to false.
In the above example, it reaches the end without even accessing the inner
set.
┌───┐ ┌───┐
parent │1,A│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^
In the opposite scenario:
┌───┐ ┌───┐
parent │1,B│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^
it would need to start moving the inner pointer at some point.
┌───┐ ┌───┐
parent │1,B│ │2,A│
└───┘ └───┘
^
┌───┐ ┌───┐
child │ 1 │ │ 2 │
└───┘ └───┘
^
But even in this case, the pointer may not reach the end in the inner set.
Though this highly depends on how merge join is implemented in the Postgres
code. I have to admit that I have a very vague idea on this...
вс, 8 дек. 2024 г. в 11:44, Andrei Lepikhov <lepihov@gmail.com>:
Show quoted text
On 8/12/2024 09:52, Andres Freund wrote:
I think avoiding touching a hash table and an index under MergeJoin can
also
be beneficial.
How would you get significant wins for mergejoins? You need to go
through both
inner and outer anyway?
In my mind, this trick can be designed for specific cases like sales
tables, as illustrated before and used by well-rounded developers. I'm
not sure that such optimisation would be profitable in general. My point
is that the sales database has lots of categories, and when requesting
product descriptions, we will not necessarily touch all the categories -
in that case, the one-sided clause could allow us to avoid scanning some
tables at all. Am I wrong?
BTW, may it be used in SEMI JOIN cases?--
regards, Andrei Lepikhov
On 2024-12-08 15:44:23 +0700, Andrei Lepikhov wrote:
On 8/12/2024 09:52, Andres Freund wrote:
I think avoiding touching a hash table and an index under MergeJoin can also
be beneficial.How would you get significant wins for mergejoins? You need to go through both
inner and outer anyway?
In my mind, this trick can be designed for specific cases like sales tables,
as illustrated before and used by well-rounded developers.
I'm not saying that the optimization isn't useful, just that i don't think it
makes much sense for mergeoins.
Besides, at least as far as I can see, fundamentally not optimizing mergejoins
in any substantial manner, it also just doesn't seem likely that queries where
this optimization would come up are likely to be planned as mergejoins. If you
have a leftjoin to a category-style table, you're very rarely going to scan
the "main" table with an ordered index scan on the category key, which would
be required for a merge join. And sorting the main table once for each
to-be-joined-category-table isn't a desirable path most of the time either.
I don't know what it means that it's to be "used by well-rounded
developers". We have to write the implementation in way it works regardless of
what kind of developer is using postgres.
I'm not sure that such optimisation would be profitable in general.
Are you suggesting it would only be enabled by a GUC?
My point is that the sales database has lots of categories, and when
requesting product descriptions, we will not necessarily touch all the
categories - in that case, the one-sided clause could allow us to avoid
scanning some tables at all. Am I wrong?
No, I don't think you're wrong. I just don't think it's worthwhile for
mergejoins, because I don't see relevant query patterns where it would
help.
BTW, may it be used in SEMI JOIN cases?
Seems possible.
Greetings,
Andres Freund
On 8/12/2024 23:37, Andres Freund wrote:
On 2024-12-08 15:44:23 +0700, Andrei Lepikhov wrote:
On 8/12/2024 09:52, Andres Freund wrote:
I think avoiding touching a hash table and an index under MergeJoin can also
be beneficial.How would you get significant wins for mergejoins? You need to go through both
inner and outer anyway?In my mind, this trick can be designed for specific cases like sales tables,
as illustrated before and used by well-rounded developers.I'm not saying that the optimization isn't useful, just that i don't think it
makes much sense for mergeoins.Besides, at least as far as I can see, fundamentally not optimizing mergejoins
in any substantial manner, it also just doesn't seem likely that queries where
this optimization would come up are likely to be planned as mergejoins. If you
have a leftjoin to a category-style table, you're very rarely going to scan
the "main" table with an ordered index scan on the category key, which would
be required for a merge join. And sorting the main table once for each
to-be-joined-category-table isn't a desirable path most of the time either.I don't know what it means that it's to be "used by well-rounded
developers". We have to write the implementation in way it works regardless of
what kind of developer is using postgres.
I got it, thanks. I agree that the profit for MJ & HJ cases looks
modest. When designing the prototype, I realised that it seemed more
"symmetrical" and logical to separate such 'one-sided' clauses during
the planning for all types of join.
But I don't insist on this statement. It would be OK even in the case of
NestLoop (remember, we should change the NL cost model too).
I'm not sure that such optimisation would be profitable in general.
Are you suggesting it would only be enabled by a GUC?
No, such a feature probably won't generate much overhead; I don't see
any reason to cover it under a GUC.
--
regards, Andrei Lepikhov
пн, 16 дек. 2024 г. в 12:52, Andrei Lepikhov <lepihov@gmail.com>:
On 12/8/24 06:13, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2024-12-07 17:06:52 -0500, Tom Lane wrote:
One could imagine that we split up the join filter conditions into
"depends on RHS" and "doesn't depend on RHS" subsets, and make the
nestloop plan node evaluate the latter set only once per LHS row,
and then skip the inner-side scan when that condition fails.As I wrote in my other email, I'm also somewhat dubious it's worth
having
explicit code for this in nodeNestloop.c.
Yeah. Your idea of pushing the "doesn't depend on RHS" subset into
a one-time Result filter atop the RHS is interesting though. Then we
don't need any new executor machinery, but we pay for that with a more
complex planner patch. Not sure how hard that would be.Well, I have been working on this topic for some time. Let me share some
experiences and thoughts. They may be helpful.
I had a user request on this feature, a kind of 'semi-pushdown' clause.
As production people said, it is quite typical in sales systems to have
a table of products and additional tables describing product categories.
Something like that:CREATE TABLE products (
id int PRIMARY KEY, payload text DEFAULT 'product',
price real DEFAULT random(), type text);
CREATE TABLE vehicles (id int PRIMARY KEY, maxspeed int DEFAULT 250);
CREATE TABLE phones (id int PRIMARY KEY, screensize real DEFAULT 6.1);INSERT INTO products (id,type)
SELECT x,'v' FROM generate_series(1,5E4) AS x;
INSERT INTO products (id,type)
SELECT x,'p' FROM generate_series(1+5E4,1E5) AS x;
INSERT INTO vehicles (id) SELECT x FROM generate_series(1,5E4) AS x;
INSERT INTO phones (id) SELECT x FROM generate_series(1+5E4,1E5) AS x;
VACUUM ANALYZE products, vehicles, phones;They usually need to get top-sales products from different categories
querying database with something like that:EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM products p
LEFT JOIN phones ph ON (p.id = ph.id)
LEFT JOIN vehicles v ON (p.id = v.id)
WHERE p.price > 0.9;Users just want to optimise such queries and get description of
different products within a single query. The query they wish to execute
looks like this:EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM products p
LEFT JOIN phones ph ON (p.id = ph.id AND p.type = 'p')
LEFT JOIN vehicles v ON (p.id = v.id AND p.type = 'v')
WHERE p.price > 0.9;The request was: why not check the RHS-only clause inside a join node
and save cycles?To determine how difficult it could be, I wrote a prototype (very raw),
which shows how it works in action and how many subsystems we have to
touch.
Also, I wonder why you think it could work with NestLoop only. I think
avoiding touching a hash table and an index under MergeJoin can also be
beneficial.The patch and reproduction script with resulting EXPLAINs are in the
attachment. Petr Petrov is doing further work. He may provide additional
details, if any.--
regards, Andrei Lepikhov
Hello!
I would like to describe the main idea:
1) We would like to speed up Nested Loop Left Join execution since getting
the next inner tuple could be expensive.
Also, not all outer tuples should be matched. For example, we have 200k
rows and inner tuples are extracted only for 100 of them.
That could allow us to execute Nested Loop Left Join faster.
2) When we analyze RestrictInfo's clauses it is possible to detect whether
it's sufficient to use outer tuple data to calculate them.
If it is the case, then we save them in rhs_joinrinfo. In the current
patch version that was done in create_nestloop_path().
Then in make_nestloop() rhs_joinqual clauses were removed from
joinqual.
Then execute rhs_joinqual in ExecNestLoop() before the next inner tuple
extraction.
If the result of their evaluation is false then we fill inner tuple's
fields with nulls and return the result without visiting the inner
relation.
3) I also propose to add a new type of filter: Outer Tuple Filter.
It's a filter which is evaluated after getting the outer tuple but
before extracting the inner tuple.
Join filter is evaluated after grabbing the inner tuple, that's why I
suggest to differentiate them.
To calculate unmatched tuples by Outer Tuple Filter an additional
counter was added to Instrumentation struct: int nunmatched.
You can find the execution plan in reproduction.sql
The patch was based on commit 3191eccd8a9bff1715f2e4fab86d2932a556185e
At least, make check-world has passed.
That's my first experience in making patches, so apologize if I
misunderstand or explain something poorly.
Looking forward to your questions and feedback.
---
Best regards,
Peter Petrov
Attachments:
v3-lazy-join.difftext/x-patch; charset=US-ASCII; name=v3-lazy-join.diffDownload
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bf322198a20..fd145421a5b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2407,7 +2407,7 @@ SELECT q.a, ft2.c1 FROM (SELECT 13 FROM ft1 WHERE c1 = 13) q(a) RIGHT JOIN ft2 O
---------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join
Output: (13), ft2.c1
- Join Filter: (13 = ft2.c1)
+ Outer Tuple Filter: (13 = ft2.c1)
-> Foreign Scan on public.ft2
Output: ft2.c1
Remote SQL: SELECT "C 1" FROM "S 1"."T 1" WHERE (("C 1" >= 10)) AND (("C 1" <= 15)) ORDER BY "C 1" ASC NULLS LAST
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a201ed30824..3f6c3dddc41 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2302,6 +2302,12 @@ ExplainNode(PlanState *planstate, List *ancestors,
}
break;
case T_NestLoop:
+ show_upper_qual(((NestLoop *) plan)->join.rhs_joinqual,
+ "Outer Tuple Filter", planstate, ancestors, es);
+ if (((NestLoop *) plan)->join.rhs_joinqual)
+ show_instrumentation_count("Rows Unmatched by Outer Tuple Filter", 3,
+ planstate, es);
+
show_upper_qual(((NestLoop *) plan)->join.joinqual,
"Join Filter", planstate, ancestors, es);
if (((NestLoop *) plan)->join.joinqual)
@@ -3957,6 +3963,8 @@ show_instrumentation_count(const char *qlabel, int which,
if (which == 2)
nfiltered = planstate->instrument->nfiltered2;
+ else if (which == 3)
+ nfiltered = planstate->instrument->nunmatched;
else
nfiltered = planstate->instrument->nfiltered1;
nloops = planstate->instrument->nloops;
diff --git a/src/backend/executor/nodeNestloop.c b/src/backend/executor/nodeNestloop.c
index 7f4bf6c4dbb..4a3d7fa0dd4 100644
--- a/src/backend/executor/nodeNestloop.c
+++ b/src/backend/executor/nodeNestloop.c
@@ -66,6 +66,7 @@ ExecNestLoop(PlanState *pstate)
TupleTableSlot *outerTupleSlot;
TupleTableSlot *innerTupleSlot;
ExprState *joinqual;
+ ExprState *rhs_joinqual;
ExprState *otherqual;
ExprContext *econtext;
ListCell *lc;
@@ -79,6 +80,7 @@ ExecNestLoop(PlanState *pstate)
nl = (NestLoop *) node->js.ps.plan;
joinqual = node->js.joinqual;
+ rhs_joinqual = node->js.rhs_joinqual;
otherqual = node->js.ps.qual;
outerPlan = outerPlanState(node);
innerPlan = innerPlanState(node);
@@ -152,12 +154,23 @@ ExecNestLoop(PlanState *pstate)
}
/*
- * we have an outerTuple, try to get the next inner tuple.
+ * we have an outerTuple, try to execute quals related to it. If the
+ * result is false then we won't get the next inner tuple. Otherwise,
+ * extract it.
*/
- ENL1_printf("getting new inner tuple");
+ ENL1_printf("executing quals related to outer tuple if any");
- innerTupleSlot = ExecProcNode(innerPlan);
- econtext->ecxt_innertuple = innerTupleSlot;
+ if (rhs_joinqual && !ExecQual(rhs_joinqual, econtext))
+ {
+ InstrCountUnmatched(node, 1);
+ innerTupleSlot = NULL;
+ }
+ else
+ {
+ ENL1_printf("getting new inner tuple");
+ innerTupleSlot = ExecProcNode(innerPlan);
+ econtext->ecxt_innertuple = innerTupleSlot;
+ }
if (TupIsNull(innerTupleSlot))
{
@@ -314,6 +327,8 @@ ExecInitNestLoop(NestLoop *node, EState *estate, int eflags)
nlstate->js.jointype = node->join.jointype;
nlstate->js.joinqual =
ExecInitQual(node->join.joinqual, (PlanState *) nlstate);
+ nlstate->js.rhs_joinqual =
+ ExecInitQual(node->join.rhs_joinqual, (PlanState *) nlstate);
/*
* detect whether we need only consider the first matching inner tuple
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 178c572b021..51a1a29fd2b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -232,7 +232,7 @@ static RecursiveUnion *make_recursive_union(List *tlist,
static BitmapAnd *make_bitmap_and(List *bitmapplans);
static BitmapOr *make_bitmap_or(List *bitmapplans);
static NestLoop *make_nestloop(List *tlist,
- List *joinclauses, List *otherclauses, List *nestParams,
+ List *joinclauses, List *rhs_joinclauses, List *otherclauses, List *nestParams,
Plan *lefttree, Plan *righttree,
JoinType jointype, bool inner_unique);
static HashJoin *make_hashjoin(List *tlist,
@@ -4352,7 +4352,9 @@ create_nestloop_plan(PlannerInfo *root,
Plan *inner_plan;
List *tlist = build_path_tlist(root, &best_path->jpath.path);
List *joinrestrictclauses = best_path->jpath.joinrestrictinfo;
+ List *rhs_joinclauses = NIL;
List *joinclauses;
+ List *rhs_otherclauses = NIL;
List *otherclauses;
Relids outerrelids;
List *nestParams;
@@ -4397,6 +4399,10 @@ create_nestloop_plan(PlannerInfo *root,
extract_actual_join_clauses(joinrestrictclauses,
best_path->jpath.path.parent->relids,
&joinclauses, &otherclauses);
+
+ extract_actual_join_clauses(best_path->jpath.rhs_joinrinfo,
+ best_path->jpath.path.parent->relids,
+ &rhs_joinclauses, &rhs_otherclauses);
}
else
{
@@ -4423,6 +4429,7 @@ create_nestloop_plan(PlannerInfo *root,
join_plan = make_nestloop(tlist,
joinclauses,
+ rhs_joinclauses,
otherclauses,
nestParams,
outer_plan,
@@ -6019,6 +6026,7 @@ make_bitmap_or(List *bitmapplans)
static NestLoop *
make_nestloop(List *tlist,
List *joinclauses,
+ List *rhs_joinclauses,
List *otherclauses,
List *nestParams,
Plan *lefttree,
@@ -6036,6 +6044,24 @@ make_nestloop(List *tlist,
node->join.jointype = jointype;
node->join.inner_unique = inner_unique;
node->join.joinqual = joinclauses;
+
+ {
+ ListCell *lc;
+
+ /* Remove quals from joinqual which belongs to outer relation */
+ foreach(lc, node->join.joinqual)
+ {
+ Node *qual = (Node *) lfirst(lc);
+
+ if (!list_member(rhs_joinclauses, qual))
+ continue;
+
+ node->join.joinqual = foreach_delete_current(node->join.joinqual, lc);
+ }
+
+ }
+
+ node->join.rhs_joinqual = rhs_joinclauses;
node->nestParams = nestParams;
return node;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 6d23df108da..2499dbfa94b 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -2292,6 +2292,16 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset)
NRM_EQUAL,
NUM_EXEC_QUAL((Plan *) join));
+ /* Process rhs_joinqual as well */
+ join->rhs_joinqual = fix_join_expr(root,
+ join->rhs_joinqual,
+ outer_itlist,
+ inner_itlist,
+ (Index) 0,
+ rtoffset,
+ NRM_EQUAL,
+ NUM_EXEC_QUAL((Plan *) join));
+
/* Now do join-type-specific stuff */
if (IsA(join, NestLoop))
{
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee26..bfd5bd3df59 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2603,6 +2603,14 @@ create_nestloop_path(PlannerInfo *root,
pathnode->jpath.innerjoinpath = inner_path;
pathnode->jpath.joinrestrictinfo = restrict_clauses;
+ /* If clause_relids belong to outerpath then add it to rhs_joinrinfo list */
+ foreach_node(RestrictInfo, rinfo, restrict_clauses)
+ {
+ if (bms_is_subset(rinfo->clause_relids, outer_path->parent->relids))
+ pathnode->jpath.rhs_joinrinfo =
+ lappend(pathnode->jpath.rhs_joinrinfo, rinfo);
+ }
+
final_cost_nestloop(root, pathnode, workspace, extra);
return pathnode;
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index bfd7b6d8445..c79f67f4d08 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -88,6 +88,8 @@ typedef struct Instrumentation
double nloops; /* # of run cycles for this node */
double nfiltered1; /* # of tuples removed by scanqual or joinqual */
double nfiltered2; /* # of tuples removed by "other" quals */
+ double nunmatched; /* # of tuples removed by quals related to
+ * only outer tuples */
BufferUsage bufusage; /* total buffer usage */
WalUsage walusage; /* total WAL usage */
} Instrumentation;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 7f71b7625df..815f7b3fd5b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1237,6 +1237,11 @@ typedef struct PlanState
if (((PlanState *)(node))->instrument) \
((PlanState *)(node))->instrument->nfiltered2 += (delta); \
} while(0)
+#define InstrCountUnmatched(node, delta) \
+ do { \
+ if (((PlanState *)(node))->instrument) \
+ ((PlanState *)(node))->instrument->nunmatched += (delta); \
+ } while(0)
/*
* EPQState is state for executing an EvalPlanQual recheck on a candidate
@@ -2124,6 +2129,8 @@ typedef struct JoinState
bool single_match; /* True if we should skip to next outer tuple
* after finding one inner match */
ExprState *joinqual; /* JOIN quals (in addition to ps.qual) */
+ ExprState *rhs_joinqual; /* JOIN quals which can be executed using only
+ * outer tuple */
} JoinState;
/* ----------------
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0759e00e96d..dd84cd96815 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2087,6 +2087,7 @@ typedef struct JoinPath
Path *innerjoinpath; /* path for the inner side of the join */
List *joinrestrictinfo; /* RestrictInfos to apply to join */
+ List *rhs_joinrinfo; /* Outer-side join clauses (filters) */
/*
* See the notes for RelOptInfo and ParamPathInfo to understand why
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 52f29bcdb69..8f8030803a5 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -792,6 +792,8 @@ typedef struct Join
JoinType jointype;
bool inner_unique;
List *joinqual; /* JOIN quals (in addition to plan.qual) */
+ List *rhs_joinqual; /* JOIN quals which can be executed by using
+ * only outer tuple */
} Join;
/* ----------------
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 1904eb65bb9..b1ca1608666 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2189,7 +2189,7 @@ SELECT count(*) FROM tenk1 LEFT JOIN tenk2 ON
------------------------------------------------------------------------------------
Aggregate
-> Nested Loop Left Join
- Join Filter: (tenk1.hundred = 42)
+ Outer Tuple Filter: (tenk1.hundred = 42)
-> Index Only Scan using tenk1_hundred on tenk1
-> Memoize
Cache Key: tenk1.hundred
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 55d44a9bcee..1b7d495abf9 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2430,7 +2430,7 @@ where t4.f1 is null;
-> Seq Scan on int4_tbl t2
-> Materialize
-> Nested Loop Left Join
- Join Filter: (t3.f1 > 1)
+ Outer Tuple Filter: (t3.f1 > 1)
-> Seq Scan on int4_tbl t3
Filter: (f1 > 0)
-> Materialize
@@ -2458,9 +2458,9 @@ from int4_tbl t1 left join int4_tbl t2 on true
-> Seq Scan on int4_tbl t1
-> Materialize
-> Nested Loop Left Join
- Join Filter: (t3.f1 > 0)
+ Outer Tuple Filter: (t3.f1 > 0)
-> Nested Loop Left Join
- Join Filter: (t2.f1 > 0)
+ Outer Tuple Filter: (t2.f1 > 0)
-> Seq Scan on int4_tbl t2
-> Materialize
-> Seq Scan on int4_tbl t3
@@ -2523,7 +2523,7 @@ select * from int4_tbl t1
QUERY PLAN
-------------------------------------------------
Nested Loop Left Join
- Join Filter: (t2.f1 = t3.f1)
+ Outer Tuple Filter: (t2.f1 = t3.f1)
-> Nested Loop Left Join
-> Nested Loop Left Join
-> Seq Scan on int4_tbl t1
@@ -2561,12 +2561,12 @@ select * from int4_tbl t1
left join (int4_tbl t2 left join int4_tbl t3 on t2.f1 > 0) on t2.f1 > 1
left join int4_tbl t4 on t2.f1 > 2 and t3.f1 > 3
where t1.f1 = coalesce(t2.f1, 1);
- QUERY PLAN
-----------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------
Nested Loop Left Join
- Join Filter: ((t2.f1 > 2) AND (t3.f1 > 3))
+ Outer Tuple Filter: ((t2.f1 > 2) AND (t3.f1 > 3))
-> Nested Loop Left Join
- Join Filter: (t2.f1 > 0)
+ Outer Tuple Filter: (t2.f1 > 0)
-> Nested Loop Left Join
Filter: (t1.f1 = COALESCE(t2.f1, 1))
-> Seq Scan on int4_tbl t1
@@ -2590,9 +2590,9 @@ select * from int4_tbl t1
Hash Right Join
Hash Cond: (t2.f1 = t1.f1)
-> Nested Loop Left Join
- Join Filter: (t2.f1 > 1)
+ Outer Tuple Filter: (t2.f1 > 1)
-> Nested Loop Left Join
- Join Filter: (t2.f1 > 0)
+ Outer Tuple Filter: (t2.f1 > 0)
Filter: (t3.f1 IS NULL)
-> Seq Scan on int4_tbl t2
-> Materialize
@@ -2612,11 +2612,11 @@ select * from int4_tbl t1
QUERY PLAN
-----------------------------------------------------------------
Nested Loop Left Join
- Join Filter: (t2.f1 > 1)
+ Outer Tuple Filter: (t2.f1 > 1)
-> Hash Right Join
Hash Cond: (t2.f1 = t1.f1)
-> Nested Loop Left Join
- Join Filter: (t2.f1 > 0)
+ Outer Tuple Filter: (t2.f1 > 0)
Filter: (t2.f1 <> COALESCE(t3.f1, '-1'::integer))
-> Seq Scan on int4_tbl t2
-> Materialize
@@ -2662,7 +2662,7 @@ on t2.q2 = 123;
-> Nested Loop Left Join
Join Filter: (t2.q1 = t5.q1)
-> Nested Loop Left Join
- Join Filter: false
+ Outer Tuple Filter: false
-> Seq Scan on int8_tbl t2
Filter: (q2 = 123)
-> Result
@@ -2676,13 +2676,13 @@ select * from int8_tbl t1
left join lateral
(select * from int8_tbl t3 where t3.q1 = t2.q1 offset 0) s
on t2.q1 = 1;
- QUERY PLAN
--------------------------------------------
+ QUERY PLAN
+-----------------------------------------------
Nested Loop Left Join
-> Seq Scan on int8_tbl t1
-> Materialize
-> Nested Loop Left Join
- Join Filter: (t2.q1 = 1)
+ Outer Tuple Filter: (t2.q1 = 1)
-> Seq Scan on int8_tbl t2
-> Seq Scan on int8_tbl t3
Filter: (q1 = t2.q1)
@@ -2700,7 +2700,7 @@ select * from int8_tbl t1
-> Seq Scan on int8_tbl t1
-> Materialize
-> Nested Loop Left Join
- Join Filter: (t2.q1 = 1)
+ Outer Tuple Filter: (t2.q1 = 1)
-> Seq Scan on int8_tbl t2
-> Function Scan on generate_series
(7 rows)
@@ -2711,13 +2711,13 @@ select * from int8_tbl t1
left join lateral
(select t2.q1 from int8_tbl t3) s
on t2.q1 = 1;
- QUERY PLAN
--------------------------------------------
+ QUERY PLAN
+-----------------------------------------------
Nested Loop Left Join
-> Seq Scan on int8_tbl t1
-> Materialize
-> Nested Loop Left Join
- Join Filter: (t2.q1 = 1)
+ Outer Tuple Filter: (t2.q1 = 1)
-> Seq Scan on int8_tbl t2
-> Seq Scan on int8_tbl t3
(7 rows)
@@ -2728,13 +2728,13 @@ select * from onek t1
left join lateral
(select * from onek t3 where t3.two = t2.two offset 0) s
on t2.unique1 = 1;
- QUERY PLAN
---------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------
Nested Loop Left Join
-> Seq Scan on onek t1
-> Materialize
-> Nested Loop Left Join
- Join Filter: (t2.unique1 = 1)
+ Outer Tuple Filter: (t2.unique1 = 1)
-> Seq Scan on onek t2
-> Memoize
Cache Key: t2.two
@@ -4196,7 +4196,7 @@ select unique1, x from tenk1 left join f_immutable_int4(1) x on unique1 = x;
QUERY PLAN
----------------------------------------------------
Nested Loop Left Join
- Join Filter: (tenk1.unique1 = 1)
+ Outer Tuple Filter: (tenk1.unique1 = 1)
-> Index Only Scan using tenk1_unique1 on tenk1
-> Materialize
-> Result
@@ -4253,7 +4253,7 @@ where nt3.id = 1 and ss2.b3;
-> Index Scan using nt3_pkey on nt3
Index Cond: (id = 1)
-> Nested Loop Left Join
- Join Filter: (0 = nt2.nt1_id)
+ Outer Tuple Filter: (0 = nt2.nt1_id)
-> Index Scan using nt2_pkey on nt2
Index Cond: (id = nt3.nt2_id)
-> Result
@@ -4496,7 +4496,7 @@ select count(*) from
-------------------------------------------------------------------------
Aggregate
-> Nested Loop Left Join
- Join Filter: (a.unique2 = b.unique1)
+ Outer Tuple Filter: (a.unique2 = b.unique1)
-> Nested Loop
-> Nested Loop
-> Seq Scan on int4_tbl
@@ -4533,7 +4533,7 @@ select b.unique1 from
-> Nested Loop Left Join
-> Seq Scan on int4_tbl i2
-> Nested Loop Left Join
- Join Filter: (b.unique1 = 42)
+ Outer Tuple Filter: (b.unique1 = 42)
-> Nested Loop
-> Nested Loop
-> Seq Scan on int4_tbl i1
@@ -4848,7 +4848,7 @@ select t1.* from
Hash Cond: (i8.q2 = i4.f1)
-> Nested Loop Left Join
Output: t1.f1, i8.q2
- Join Filter: (t1.f1 = '***'::text)
+ Outer Tuple Filter: (t1.f1 = '***'::text)
-> Seq Scan on public.text_tbl t1
Output: t1.f1
-> Materialize
@@ -4909,7 +4909,7 @@ select t1.* from
Hash Cond: (i8.q2 = i4.f1)
-> Nested Loop Left Join
Output: t1.f1, i8.q2
- Join Filter: (t1.f1 = '***'::text)
+ Outer Tuple Filter: (t1.f1 = '***'::text)
-> Seq Scan on public.text_tbl t1
Output: t1.f1
-> Materialize
@@ -4975,7 +4975,7 @@ select t1.* from
Hash Cond: (i8.q2 = i4.f1)
-> Nested Loop Left Join
Output: t1.f1, i8.q2
- Join Filter: (t1.f1 = '***'::text)
+ Outer Tuple Filter: (t1.f1 = '***'::text)
-> Seq Scan on public.text_tbl t1
Output: t1.f1
-> Materialize
@@ -5084,7 +5084,7 @@ left join
QUERY PLAN
--------------------------------
Nested Loop Left Join
- Join Filter: false
+ Outer Tuple Filter: false
-> Result
-> Result
One-Time Filter: false
@@ -5100,16 +5100,16 @@ select 1 from
join int4_tbl i42 on ss1.a is null or i8.q1 <> i8.q2
right join (select 2 as b) ss2
on ss2.b < i4.f1;
- QUERY PLAN
------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------
Nested Loop Left Join
-> Result
-> Nested Loop
-> Nested Loop Left Join
- Join Filter: NULL::boolean
+ Outer Tuple Filter: NULL::boolean
Filter: (((1) IS NULL) OR (i8.q1 <> i8.q2))
-> Nested Loop Left Join
- Join Filter: (i4.f1 IS NOT NULL)
+ Outer Tuple Filter: (i4.f1 IS NOT NULL)
-> Seq Scan on int4_tbl i4
Filter: (2 < f1)
-> Materialize
@@ -5720,7 +5720,7 @@ from int4_tbl as t1
QUERY PLAN
--------------------------------
Nested Loop Left Join
- Join Filter: false
+ Outer Tuple Filter: false
-> Seq Scan on int4_tbl t1
-> Result
One-Time Filter: false
@@ -5742,7 +5742,7 @@ from int4_tbl as t1
QUERY PLAN
--------------------------------
Nested Loop Left Join
- Join Filter: false
+ Outer Tuple Filter: false
-> Seq Scan on int4_tbl t1
-> Result
One-Time Filter: false
@@ -5775,7 +5775,7 @@ from int8_tbl t1
QUERY PLAN
-------------------------------------------------
Nested Loop Left Join
- Join Filter: (t2.q2 < t3.unique2)
+ Outer Tuple Filter: (t2.q2 < t3.unique2)
-> Nested Loop Left Join
Join Filter: (t2.q1 > t3.unique1)
-> Hash Left Join
@@ -5915,7 +5915,7 @@ select 1 from a t1
-> Seq Scan on a t1
-> Materialize
-> Nested Loop Left Join
- Join Filter: (t2.id = 1)
+ Outer Tuple Filter: (t2.id = 1)
-> Index Only Scan using a_pkey on a t2
Index Cond: (id = 1)
-> Seq Scan on a t3
@@ -6122,7 +6122,7 @@ SELECT q2 FROM
------------------------------------------------------
Nested Loop Left Join
Output: q2
- Join Filter: NULL::boolean
+ Outer Tuple Filter: NULL::boolean
Filter: (('constant'::text) >= ('constant'::text))
-> Seq Scan on public.int4_tbl
Output: int4_tbl.f1
@@ -6144,7 +6144,7 @@ FROM int4_tbl
-> Seq Scan on int4_tbl
-> Materialize
-> Nested Loop Left Join
- Join Filter: NULL::boolean
+ Outer Tuple Filter: NULL::boolean
Filter: ((tenk1.unique1 = (42)) OR (tenk1.unique2 = (42)))
-> Seq Scan on tenk1
-> Result
@@ -7242,7 +7242,7 @@ select * from
--------------------------------
Nested Loop Left Join
Output: 0, (1), ((1))
- Join Filter: false
+ Outer Tuple Filter: false
-> Result
Output: 1
-> Result
@@ -7361,7 +7361,7 @@ select * from int8_tbl i8 left join lateral
--------------------------------------
Nested Loop Left Join
Output: i8.q1, i8.q2, f1, (i8.q2)
- Join Filter: false
+ Outer Tuple Filter: false
-> Seq Scan on public.int8_tbl i8
Output: i8.q1, i8.q2
-> Result
diff --git a/src/test/regress/expected/predicate.out b/src/test/regress/expected/predicate.out
index 965a3a76161..30b169bcf65 100644
--- a/src/test/regress/expected/predicate.out
+++ b/src/test/regress/expected/predicate.out
@@ -119,9 +119,9 @@ SELECT * FROM pred_tab t1
QUERY PLAN
-------------------------------------------
Nested Loop Left Join
- Join Filter: (t2.a IS NOT NULL)
+ Outer Tuple Filter: (t2.a IS NOT NULL)
-> Nested Loop Left Join
- Join Filter: (t1.a = 1)
+ Outer Tuple Filter: (t1.a = 1)
-> Seq Scan on pred_tab t1
-> Materialize
-> Seq Scan on pred_tab t2
@@ -135,13 +135,13 @@ EXPLAIN (COSTS OFF)
SELECT * FROM pred_tab t1
LEFT JOIN pred_tab t2 ON TRUE
LEFT JOIN pred_tab t3 ON t2.a IS NULL AND t2.b = 1;
- QUERY PLAN
----------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------
Nested Loop Left Join
-> Seq Scan on pred_tab t1
-> Materialize
-> Nested Loop Left Join
- Join Filter: (false AND (t2.b = 1))
+ Outer Tuple Filter: (false AND (t2.b = 1))
-> Seq Scan on pred_tab t2
-> Result
One-Time Filter: false
@@ -156,9 +156,9 @@ SELECT * FROM pred_tab t1
QUERY PLAN
-------------------------------------------
Nested Loop Left Join
- Join Filter: (t2.a IS NULL)
+ Outer Tuple Filter: (t2.a IS NULL)
-> Nested Loop Left Join
- Join Filter: (t1.a = 1)
+ Outer Tuple Filter: (t1.a = 1)
-> Seq Scan on pred_tab t1
-> Materialize
-> Seq Scan on pred_tab t2
@@ -191,12 +191,12 @@ EXPLAIN (COSTS OFF)
SELECT * FROM pred_tab t1
LEFT JOIN pred_tab t2 ON t1.a = 1
LEFT JOIN pred_tab t3 ON t2.a IS NOT NULL OR t2.b = 1;
- QUERY PLAN
----------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------
Nested Loop Left Join
- Join Filter: ((t2.a IS NOT NULL) OR (t2.b = 1))
+ Outer Tuple Filter: ((t2.a IS NOT NULL) OR (t2.b = 1))
-> Nested Loop Left Join
- Join Filter: (t1.a = 1)
+ Outer Tuple Filter: (t1.a = 1)
-> Seq Scan on pred_tab t1
-> Materialize
-> Seq Scan on pred_tab t2
@@ -210,13 +210,13 @@ EXPLAIN (COSTS OFF)
SELECT * FROM pred_tab t1
LEFT JOIN pred_tab t2 ON TRUE
LEFT JOIN pred_tab t3 ON (t2.a IS NULL OR t2.c IS NULL) AND t2.b = 1;
- QUERY PLAN
----------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------
Nested Loop Left Join
-> Seq Scan on pred_tab t1
-> Materialize
-> Nested Loop Left Join
- Join Filter: (false AND (t2.b = 1))
+ Outer Tuple Filter: (false AND (t2.b = 1))
-> Seq Scan on pred_tab t2
-> Result
One-Time Filter: false
@@ -228,12 +228,12 @@ EXPLAIN (COSTS OFF)
SELECT * FROM pred_tab t1
LEFT JOIN pred_tab t2 ON t1.a = 1
LEFT JOIN pred_tab t3 ON t2.a IS NULL OR t2.c IS NULL;
- QUERY PLAN
----------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------
Nested Loop Left Join
- Join Filter: ((t2.a IS NULL) OR (t2.c IS NULL))
+ Outer Tuple Filter: ((t2.a IS NULL) OR (t2.c IS NULL))
-> Nested Loop Left Join
- Join Filter: (t1.a = 1)
+ Outer Tuple Filter: (t1.a = 1)
-> Seq Scan on pred_tab t1
-> Materialize
-> Seq Scan on pred_tab t2
@@ -301,13 +301,13 @@ SELECT 1 FROM pred_tab t1
(pred_tab t2 LEFT JOIN pred_tab t3 ON t2.a = t3.a) ON TRUE
LEFT JOIN pred_tab t4 ON t1.a IS NULL AND t1.b = 1
RIGHT JOIN pred_tab t5 ON t1.b = t5.b;
- QUERY PLAN
----------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------
Hash Right Join
Hash Cond: (t1.b = t5.b)
-> Nested Loop Left Join
-> Nested Loop Left Join
- Join Filter: (false AND (t1.b = 1))
+ Outer Tuple Filter: (false AND (t1.b = 1))
-> Seq Scan on pred_tab t1
-> Result
One-Time Filter: false