Support "Right Semi Join" plan shapes
In thread [1]/messages/by-id/CAMbWs4_eChX1bN=vj0Uzg_7iz9Uivan+Wjjor-X87L-V27A+rw@mail.gmail.com which discussed 'Right Anti Join', Tom once mentioned
'Right Semi Join'. After a preliminary investigation I think it is
beneficial and can be implemented with very short change. With 'Right
Semi Join', what we want to do is to just have the first match for each
inner tuple. For HashJoin, after scanning the hash bucket for matches
to current outer, we just need to check whether the inner tuple has been
set match and skip it if so. For MergeJoin, we can do it by avoiding
restoring inner scan to the marked tuple in EXEC_MJ_TESTOUTER, in the
case when new outer tuple == marked tuple.
As that thread is already too long, fork a new thread and attach a patch
used for discussion. The patch implements 'Right Semi Join' for
HashJoin.
[1]: /messages/by-id/CAMbWs4_eChX1bN=vj0Uzg_7iz9Uivan+Wjjor-X87L-V27A+rw@mail.gmail.com
/messages/by-id/CAMbWs4_eChX1bN=vj0Uzg_7iz9Uivan+Wjjor-X87L-V27A+rw@mail.gmail.com
Thanks
Richard
Attachments:
v1-0001-Support-Right-Semi-Join-plan-shapes.patchapplication/octet-stream; name=v1-0001-Support-Right-Semi-Join-plan-shapes.patchDownload
From 5371c6045b69d7f06deeff1d3e37ded19b099714 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 18 Apr 2023 16:25:36 +0800
Subject: [PATCH v1] Support "Right Semi Join" plan shapes
---
src/backend/commands/explain.c | 3 +
src/backend/executor/nodeHashjoin.c | 14 +-
src/backend/optimizer/path/joinpath.c | 14 +-
src/backend/optimizer/path/joinrels.c | 3 +
src/backend/optimizer/path/pathkeys.c | 3 +
src/include/nodes/nodes.h | 1 +
src/test/regress/expected/join.out | 20 +-
src/test/regress/expected/partition_join.out | 216 +++++++++---------
src/test/regress/expected/select_parallel.out | 32 +--
src/test/regress/expected/updatable_views.out | 6 +-
10 files changed, 171 insertions(+), 141 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5334c503e1..c3bd0bdcbc 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1561,6 +1561,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
case JOIN_ANTI:
jointype = "Anti";
break;
+ case JOIN_RIGHT_SEMI:
+ jointype = "Right Semi";
+ break;
case JOIN_RIGHT_ANTI:
jointype = "Right Anti";
break;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 0a3f32f731..8bf11a869e 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -488,6 +488,14 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
}
}
+ /*
+ * In a right-semijoin, we only need the first match for each
+ * inner tuple.
+ */
+ if (node->js.jointype == JOIN_RIGHT_SEMI &&
+ HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
+ continue;
+
/*
* We've got a match, but still need to test non-hashed quals.
* ExecScanHashBucket already set up all the state needed to
@@ -506,8 +514,9 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
/*
- * This is really only needed if HJ_FILL_INNER(node), but
- * we'll avoid the branch and just set it always.
+ * This is really only needed if HJ_FILL_INNER(node) or
+ * it's a right-semijoin, but we'll avoid the branch and
+ * just set it always.
*/
if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
@@ -734,6 +743,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
{
case JOIN_INNER:
case JOIN_SEMI:
+ case JOIN_RIGHT_SEMI:
break;
case JOIN_LEFT:
case JOIN_ANTI:
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index cd80e61fd7..1edb1760f4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -129,7 +129,7 @@ add_paths_to_joinrel(PlannerInfo *root,
List *restrictlist)
{
JoinPathExtraData extra;
- bool mergejoin_allowed = true;
+ bool mergejoin_allowed = !(jointype == JOIN_RIGHT_SEMI);
ListCell *lc;
Relids joinrelids;
@@ -206,7 +206,8 @@ add_paths_to_joinrel(PlannerInfo *root,
* way of implementing a full outer join, so override enable_mergejoin if
* it's a full join.
*/
- if (enable_mergejoin || jointype == JOIN_FULL)
+ if ((enable_mergejoin || jointype == JOIN_FULL) &&
+ jointype != JOIN_RIGHT_SEMI)
extra.mergeclause_list = select_mergejoin_clauses(root,
joinrel,
outerrel,
@@ -2232,13 +2233,14 @@ hash_inner_and_outer(PlannerInfo *root,
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
* path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, or right-anti join, we can't use
- * parallelism (building the hash table in each backend) because
- * no one process has all the match bits.
+ * inner path. If full, right, right-anti or right-semi join, we
+ * can't use parallelism (building the hash table in each backend)
+ * because no one process has all the match bits.
*/
if (save_jointype == JOIN_FULL ||
save_jointype == JOIN_RIGHT ||
- save_jointype == JOIN_RIGHT_ANTI)
+ save_jointype == JOIN_RIGHT_ANTI ||
+ save_jointype == JOIN_RIGHT_SEMI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4c6ea3a2f0..e8290bf80b 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -884,6 +884,9 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
add_paths_to_joinrel(root, joinrel, rel1, rel2,
JOIN_SEMI, sjinfo,
restrictlist);
+ add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ JOIN_RIGHT_SEMI, sjinfo,
+ restrictlist);
}
/*
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index e53ea84224..4eda6eb75b 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1095,6 +1095,9 @@ build_join_pathkeys(PlannerInfo *root,
JoinType jointype,
List *outer_pathkeys)
{
+ /* RIGHT_SEMI should not come here */
+ Assert(jointype != JOIN_RIGHT_SEMI);
+
if (jointype == JOIN_FULL ||
jointype == JOIN_RIGHT ||
jointype == JOIN_RIGHT_ANTI)
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index f8e8fe699a..013aaf6040 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -317,6 +317,7 @@ typedef enum JoinType
*/
JOIN_SEMI, /* 1 copy of each LHS row that has match(es) */
JOIN_ANTI, /* 1 copy of each LHS row that has no match */
+ JOIN_RIGHT_SEMI, /* 1 copy of each RHS row that has match(es) */
JOIN_RIGHT_ANTI, /* 1 copy of each RHS row that has no match */
/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5d59ed7890..90397bc095 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1909,18 +1909,18 @@ select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
and i4.f1 = a.tenthous;
- QUERY PLAN
-----------------------------------------------
- Hash Semi Join
- Hash Cond: (a.twothousand = b.twothousand)
+ QUERY PLAN
+-------------------------------------------------
+ Hash Right Semi Join
+ Hash Cond: (b.twothousand = a.twothousand)
Join Filter: (a.fivethous <> b.fivethous)
- -> Hash Join
- Hash Cond: (a.tenthous = i4.f1)
- -> Seq Scan on tenk1 a
- -> Hash
- -> Seq Scan on int4_tbl i4
+ -> Seq Scan on tenk1 b
-> Hash
- -> Seq Scan on tenk1 b
+ -> Hash Join
+ Hash Cond: (a.tenthous = i4.f1)
+ -> Seq Scan on tenk1 a
+ -> Hash
+ -> Seq Scan on int4_tbl i4
(10 rows)
--
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 762cc8e53b..b8cbde9df4 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2329,24 +2329,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2538,24 +2538,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2822,25 +2822,25 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_adv t1 INNER JOIN prt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: (t1.a = t2.b)
+ -> Hash Right Semi Join
+ Hash Cond: (t2.b = t1.a)
-> Append
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt2_adv_p1 t2_1
+ -> Seq Scan on prt2_adv_p2 t2_2
+ -> Seq Scan on prt2_adv_p3_1 t2_3
+ -> Seq Scan on prt2_adv_p3_2 t2_4
-> Hash
-> Append
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Seq Scan on prt2_adv_p3_1 t2_3
- -> Seq Scan on prt2_adv_p3_2 t2_4
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(17 rows)
-- left join
@@ -3219,27 +3219,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3409,27 +3412,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3625,25 +3631,25 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: ((t1.a = t2.a) AND (t1.c = t2.c))
+ -> Hash Right Semi Join
+ Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Seq Scan on plt2_adv_p1 t2_1
+ -> Seq Scan on plt2_adv_p2_1 t2_2
+ -> Seq Scan on plt2_adv_p2_2 t2_3
+ -> Seq Scan on plt2_adv_p3 t2_4
-> Hash
-> Append
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Seq Scan on plt2_adv_p2_1 t2_2
- -> Seq Scan on plt2_adv_p2_2 t2_3
- -> Seq Scan on plt2_adv_p3 t2_4
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
(17 rows)
-- left join
@@ -3773,28 +3779,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1_null t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
+ -> Seq Scan on plt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p1_null t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3_null t2_3
-(19 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index d88353d496..d524989f25 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -999,26 +999,26 @@ reset role;
explain (costs off, verbose)
select count(*) from tenk1 a where (unique1, two) in
(select unique1, row_number() over() from tenk1 b);
- QUERY PLAN
-----------------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------
Aggregate
Output: count(*)
- -> Hash Semi Join
- Hash Cond: ((a.unique1 = b.unique1) AND (a.two = (row_number() OVER (?))))
- -> Gather
+ -> Hash Right Semi Join
+ Hash Cond: ((b.unique1 = a.unique1) AND ((row_number() OVER (?)) = a.two))
+ -> WindowAgg
+ Output: b.unique1, row_number() OVER (?)
+ -> Gather
+ Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
+ Output: b.unique1
+ -> Hash
Output: a.unique1, a.two
- Workers Planned: 4
- -> Parallel Seq Scan on public.tenk1 a
+ -> Gather
Output: a.unique1, a.two
- -> Hash
- Output: b.unique1, (row_number() OVER (?))
- -> WindowAgg
- Output: b.unique1, row_number() OVER (?)
- -> Gather
- Output: b.unique1
- Workers Planned: 4
- -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
- Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Seq Scan on public.tenk1 a
+ Output: a.unique1, a.two
(18 rows)
-- LIMIT/OFFSET within sub-selects can't be pushed to workers.
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 0cbedc657d..9f18e227dd 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2675,10 +2675,10 @@ NOTICE: snooped value: 8
SELECT * FROM v1 WHERE b=8;
a | b | c | d
---+---+------+------
- 9 | 8 | t1 | t11d
- 9 | 8 | t11 | t11d
- 9 | 8 | t12 | t11d
9 | 8 | t111 | t11d
+ 9 | 8 | t12 | t11d
+ 9 | 8 | t11 | t11d
+ 9 | 8 | t1 | t11d
(4 rows)
DELETE FROM v1 WHERE snoop(a) AND leakproof(a); -- should not delete everything, just where a>5
--
2.31.0
On Tue, Apr 18, 2023 at 5:07 PM Richard Guo <guofenglinux@gmail.com> wrote:
In thread [1] which discussed 'Right Anti Join', Tom once mentioned
'Right Semi Join'. After a preliminary investigation I think it is
beneficial and can be implemented with very short change. With 'Right
Semi Join', what we want to do is to just have the first match for each
inner tuple. For HashJoin, after scanning the hash bucket for matches
to current outer, we just need to check whether the inner tuple has been
set match and skip it if so. For MergeJoin, we can do it by avoiding
restoring inner scan to the marked tuple in EXEC_MJ_TESTOUTER, in the
case when new outer tuple == marked tuple.As that thread is already too long, fork a new thread and attach a patch
used for discussion. The patch implements 'Right Semi Join' for
HashJoin.
The cfbot reminds that this patch does not apply any more, so rebase it
to v2.
Thanks
Richard
Attachments:
v2-0001-Support-Right-Semi-Join-plan-shapes.patchapplication/octet-stream; name=v2-0001-Support-Right-Semi-Join-plan-shapes.patchDownload
From 249673fc738086df71b3623be7e6bc97b02fe5c8 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 18 Apr 2023 16:25:36 +0800
Subject: [PATCH v2] Support "Right Semi Join" plan shapes
---
src/backend/commands/explain.c | 3 +
src/backend/executor/nodeHashjoin.c | 14 +-
src/backend/optimizer/path/joinpath.c | 34 ++-
src/backend/optimizer/path/joinrels.c | 3 +
src/backend/optimizer/path/pathkeys.c | 3 +
src/include/nodes/nodes.h | 1 +
src/test/regress/expected/join.out | 20 +-
src/test/regress/expected/partition_join.out | 216 +++++++++---------
src/test/regress/expected/select_parallel.out | 32 +--
src/test/regress/expected/updatable_views.out | 6 +-
10 files changed, 185 insertions(+), 147 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..9222ea191a 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1561,6 +1561,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
case JOIN_ANTI:
jointype = "Anti";
break;
+ case JOIN_RIGHT_SEMI:
+ jointype = "Right Semi";
+ break;
case JOIN_RIGHT_ANTI:
jointype = "Right Anti";
break;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 980746128b..41bcbf5732 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -534,6 +534,14 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
}
}
+ /*
+ * In a right-semijoin, we only need the first match for each
+ * inner tuple.
+ */
+ if (node->js.jointype == JOIN_RIGHT_SEMI &&
+ HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
+ continue;
+
/*
* We've got a match, but still need to test non-hashed quals.
* ExecScanHashBucket already set up all the state needed to
@@ -552,8 +560,9 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
/*
- * This is really only needed if HJ_FILL_INNER(node), but
- * we'll avoid the branch and just set it always.
+ * This is really only needed if HJ_FILL_INNER(node) or
+ * it's a right-semijoin, but we'll avoid the branch and
+ * just set it always.
*/
if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
@@ -780,6 +789,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
{
case JOIN_INNER:
case JOIN_SEMI:
+ case JOIN_RIGHT_SEMI:
break;
case JOIN_LEFT:
case JOIN_ANTI:
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 059e605e04..94f3e31d90 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -2301,13 +2301,14 @@ hash_inner_and_outer(PlannerInfo *root,
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
* path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, or right-anti join, we can't use
- * parallelism (building the hash table in each backend) because
- * no one process has all the match bits.
+ * inner path. If full, right, right-anti or right-semi join, we
+ * can't use parallelism (building the hash table in each backend)
+ * because no one process has all the match bits.
*/
if (save_jointype == JOIN_FULL ||
save_jointype == JOIN_RIGHT ||
- save_jointype == JOIN_RIGHT_ANTI)
+ save_jointype == JOIN_RIGHT_ANTI ||
+ save_jointype == JOIN_RIGHT_SEMI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
@@ -2330,14 +2331,14 @@ hash_inner_and_outer(PlannerInfo *root,
* Select mergejoin clauses that are usable for a particular join.
* Returns a list of RestrictInfo nodes for those clauses.
*
- * *mergejoin_allowed is normally set to true, but it is set to false if
- * this is a right/right-anti/full join and there are nonmergejoinable join
- * clauses. The executor's mergejoin machinery cannot handle such cases, so
- * we have to avoid generating a mergejoin plan. (Note that this flag does
- * NOT consider whether there are actually any mergejoinable clauses. This is
- * correct because in some cases we need to build a clauseless mergejoin.
- * Simply returning NIL is therefore not enough to distinguish safe from
- * unsafe cases.)
+ * *mergejoin_allowed is normally set to true, but it is set to false if this
+ * is a right-semi join, or this is a right/right-anti/full join and there are
+ * nonmergejoinable join clauses. The executor's mergejoin machinery cannot
+ * handle such cases, so we have to avoid generating a mergejoin plan. (Note
+ * that this flag does NOT consider whether there are actually any
+ * mergejoinable clauses. This is correct because in some cases we need to
+ * build a clauseless mergejoin. Simply returning NIL is therefore not enough
+ * to distinguish safe from unsafe cases.)
*
* We also mark each selected RestrictInfo to show which side is currently
* being considered as outer. These are transient markings that are only
@@ -2361,6 +2362,15 @@ select_mergejoin_clauses(PlannerInfo *root,
bool have_nonmergeable_joinclause = false;
ListCell *l;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
foreach(l, restrictlist)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index d03ace50a1..0f165bb224 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -973,6 +973,9 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
add_paths_to_joinrel(root, joinrel, rel1, rel2,
JOIN_SEMI, sjinfo,
restrictlist);
+ add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ JOIN_RIGHT_SEMI, sjinfo,
+ restrictlist);
}
/*
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index e53ea84224..4eda6eb75b 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1095,6 +1095,9 @@ build_join_pathkeys(PlannerInfo *root,
JoinType jointype,
List *outer_pathkeys)
{
+ /* RIGHT_SEMI should not come here */
+ Assert(jointype != JOIN_RIGHT_SEMI);
+
if (jointype == JOIN_FULL ||
jointype == JOIN_RIGHT ||
jointype == JOIN_RIGHT_ANTI)
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index f8e8fe699a..013aaf6040 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -317,6 +317,7 @@ typedef enum JoinType
*/
JOIN_SEMI, /* 1 copy of each LHS row that has match(es) */
JOIN_ANTI, /* 1 copy of each LHS row that has no match */
+ JOIN_RIGHT_SEMI, /* 1 copy of each RHS row that has match(es) */
JOIN_RIGHT_ANTI, /* 1 copy of each RHS row that has no match */
/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9b8638f286..f5ad84c169 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1909,18 +1909,18 @@ select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
and i4.f1 = a.tenthous;
- QUERY PLAN
-----------------------------------------------
- Hash Semi Join
- Hash Cond: (a.twothousand = b.twothousand)
+ QUERY PLAN
+-------------------------------------------------
+ Hash Right Semi Join
+ Hash Cond: (b.twothousand = a.twothousand)
Join Filter: (a.fivethous <> b.fivethous)
- -> Hash Join
- Hash Cond: (a.tenthous = i4.f1)
- -> Seq Scan on tenk1 a
- -> Hash
- -> Seq Scan on int4_tbl i4
+ -> Seq Scan on tenk1 b
-> Hash
- -> Seq Scan on tenk1 b
+ -> Hash Join
+ Hash Cond: (a.tenthous = i4.f1)
+ -> Seq Scan on tenk1 a
+ -> Hash
+ -> Seq Scan on int4_tbl i4
(10 rows)
--
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 6560fe2416..26da5692cd 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2375,24 +2375,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2584,24 +2584,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2868,25 +2868,25 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_adv t1 INNER JOIN prt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: (t1.a = t2.b)
+ -> Hash Right Semi Join
+ Hash Cond: (t2.b = t1.a)
-> Append
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt2_adv_p1 t2_1
+ -> Seq Scan on prt2_adv_p2 t2_2
+ -> Seq Scan on prt2_adv_p3_1 t2_3
+ -> Seq Scan on prt2_adv_p3_2 t2_4
-> Hash
-> Append
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Seq Scan on prt2_adv_p3_1 t2_3
- -> Seq Scan on prt2_adv_p3_2 t2_4
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(17 rows)
-- left join
@@ -3265,27 +3265,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3455,27 +3458,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3671,25 +3677,25 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: ((t1.a = t2.a) AND (t1.c = t2.c))
+ -> Hash Right Semi Join
+ Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Seq Scan on plt2_adv_p1 t2_1
+ -> Seq Scan on plt2_adv_p2_1 t2_2
+ -> Seq Scan on plt2_adv_p2_2 t2_3
+ -> Seq Scan on plt2_adv_p3 t2_4
-> Hash
-> Append
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Seq Scan on plt2_adv_p2_1 t2_2
- -> Seq Scan on plt2_adv_p2_2 t2_3
- -> Seq Scan on plt2_adv_p3 t2_4
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
(17 rows)
-- left join
@@ -3819,28 +3825,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1_null t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
+ -> Seq Scan on plt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p1_null t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3_null t2_3
-(19 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index d88353d496..d524989f25 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -999,26 +999,26 @@ reset role;
explain (costs off, verbose)
select count(*) from tenk1 a where (unique1, two) in
(select unique1, row_number() over() from tenk1 b);
- QUERY PLAN
-----------------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------
Aggregate
Output: count(*)
- -> Hash Semi Join
- Hash Cond: ((a.unique1 = b.unique1) AND (a.two = (row_number() OVER (?))))
- -> Gather
+ -> Hash Right Semi Join
+ Hash Cond: ((b.unique1 = a.unique1) AND ((row_number() OVER (?)) = a.two))
+ -> WindowAgg
+ Output: b.unique1, row_number() OVER (?)
+ -> Gather
+ Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
+ Output: b.unique1
+ -> Hash
Output: a.unique1, a.two
- Workers Planned: 4
- -> Parallel Seq Scan on public.tenk1 a
+ -> Gather
Output: a.unique1, a.two
- -> Hash
- Output: b.unique1, (row_number() OVER (?))
- -> WindowAgg
- Output: b.unique1, row_number() OVER (?)
- -> Gather
- Output: b.unique1
- Workers Planned: 4
- -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
- Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Seq Scan on public.tenk1 a
+ Output: a.unique1, a.two
(18 rows)
-- LIMIT/OFFSET within sub-selects can't be pushed to workers.
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1950e6f281..4172468aec 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2675,10 +2675,10 @@ NOTICE: snooped value: 8
SELECT * FROM v1 WHERE b=8;
a | b | c | d
---+---+------+------
- 9 | 8 | t1 | t11d
- 9 | 8 | t11 | t11d
- 9 | 8 | t12 | t11d
9 | 8 | t111 | t11d
+ 9 | 8 | t12 | t11d
+ 9 | 8 | t11 | t11d
+ 9 | 8 | t1 | t11d
(4 rows)
DELETE FROM v1 WHERE snoop(a) AND leakproof(a); -- should not delete everything, just where a>5
--
2.31.0
On Thu, Aug 10, 2023 at 3:24 PM Richard Guo <guofenglinux@gmail.com> wrote:
The cfbot reminds that this patch does not apply any more, so rebase it
to v2.
Attached is another rebase over the latest master. Any feedback is
appreciated.
Thanks
Richard
Attachments:
v3-0001-Support-Right-Semi-Join-plan-shapes.patchapplication/octet-stream; name=v3-0001-Support-Right-Semi-Join-plan-shapes.patchDownload
From 1365e36ba86871094437b2907b1505b78055ec47 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 18 Apr 2023 16:25:36 +0800
Subject: [PATCH v3] Support "Right Semi Join" plan shapes
---
src/backend/commands/explain.c | 3 +
src/backend/executor/nodeHashjoin.c | 14 +-
src/backend/optimizer/path/joinpath.c | 45 ++--
src/backend/optimizer/path/joinrels.c | 3 +
src/backend/optimizer/path/pathkeys.c | 3 +
src/include/nodes/nodes.h | 1 +
src/test/regress/expected/join.out | 20 +-
src/test/regress/expected/partition_join.out | 216 +++++++++---------
src/test/regress/expected/select_parallel.out | 32 +--
src/test/regress/expected/updatable_views.out | 6 +-
10 files changed, 194 insertions(+), 149 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f1d71bc54e..40b4765a04 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1569,6 +1569,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
case JOIN_ANTI:
jointype = "Anti";
break;
+ case JOIN_RIGHT_SEMI:
+ jointype = "Right Semi";
+ break;
case JOIN_RIGHT_ANTI:
jointype = "Right Anti";
break;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 25a2d78f15..8a95f51a33 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -534,6 +534,14 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
}
}
+ /*
+ * In a right-semijoin, we only need the first match for each
+ * inner tuple.
+ */
+ if (node->js.jointype == JOIN_RIGHT_SEMI &&
+ HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
+ continue;
+
/*
* We've got a match, but still need to test non-hashed quals.
* ExecScanHashBucket already set up all the state needed to
@@ -552,8 +560,9 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
/*
- * This is really only needed if HJ_FILL_INNER(node), but
- * we'll avoid the branch and just set it always.
+ * This is really only needed if HJ_FILL_INNER(node) or
+ * it's a right-semijoin, but we'll avoid the branch and
+ * just set it always.
*/
if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
@@ -780,6 +789,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
{
case JOIN_INNER:
case JOIN_SEMI:
+ case JOIN_RIGHT_SEMI:
break;
case JOIN_LEFT:
case JOIN_ANTI:
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index b2916ad690..31cb9b814b 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -287,8 +287,8 @@ add_paths_to_joinrel(PlannerInfo *root,
* sorted. This includes both nestloops and mergejoins where the outer
* path is already ordered. Again, skip this if we can't mergejoin.
* (That's okay because we know that nestloop can't handle
- * right/right-anti/full joins at all, so it wouldn't work in the
- * prohibited cases either.)
+ * right/right-anti/right-semi/full joins at all, so it wouldn't work in
+ * the prohibited cases either.)
*/
if (mergejoin_allowed)
match_unsorted_outer(root, joinrel, outerrel, innerrel,
@@ -1720,6 +1720,13 @@ match_unsorted_outer(PlannerInfo *root,
Path *matpath = NULL;
ListCell *lc1;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
+ * join.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ return;
+
/*
* Nestloop only supports inner, left, semi, and anti joins. Also, if we
* are doing a right, right-anti or full mergejoin, we must use *all* the
@@ -2289,13 +2296,14 @@ hash_inner_and_outer(PlannerInfo *root,
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
* path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, or right-anti join, we can't use
- * parallelism (building the hash table in each backend) because
- * no one process has all the match bits.
+ * inner path. If full, right, right-anti or right-semi join, we
+ * can't use parallelism (building the hash table in each backend)
+ * because no one process has all the match bits.
*/
if (save_jointype == JOIN_FULL ||
save_jointype == JOIN_RIGHT ||
- save_jointype == JOIN_RIGHT_ANTI)
+ save_jointype == JOIN_RIGHT_ANTI ||
+ save_jointype == JOIN_RIGHT_SEMI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
@@ -2318,14 +2326,14 @@ hash_inner_and_outer(PlannerInfo *root,
* Select mergejoin clauses that are usable for a particular join.
* Returns a list of RestrictInfo nodes for those clauses.
*
- * *mergejoin_allowed is normally set to true, but it is set to false if
- * this is a right/right-anti/full join and there are nonmergejoinable join
- * clauses. The executor's mergejoin machinery cannot handle such cases, so
- * we have to avoid generating a mergejoin plan. (Note that this flag does
- * NOT consider whether there are actually any mergejoinable clauses. This is
- * correct because in some cases we need to build a clauseless mergejoin.
- * Simply returning NIL is therefore not enough to distinguish safe from
- * unsafe cases.)
+ * *mergejoin_allowed is normally set to true, but it is set to false if this
+ * is a right-semi join, or this is a right/right-anti/full join and there are
+ * nonmergejoinable join clauses. The executor's mergejoin machinery cannot
+ * handle such cases, so we have to avoid generating a mergejoin plan. (Note
+ * that this flag does NOT consider whether there are actually any
+ * mergejoinable clauses. This is correct because in some cases we need to
+ * build a clauseless mergejoin. Simply returning NIL is therefore not enough
+ * to distinguish safe from unsafe cases.)
*
* We also mark each selected RestrictInfo to show which side is currently
* being considered as outer. These are transient markings that are only
@@ -2349,6 +2357,15 @@ select_mergejoin_clauses(PlannerInfo *root,
bool have_nonmergeable_joinclause = false;
ListCell *l;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
foreach(l, restrictlist)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index d03ace50a1..0f165bb224 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -973,6 +973,9 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
add_paths_to_joinrel(root, joinrel, rel1, rel2,
JOIN_SEMI, sjinfo,
restrictlist);
+ add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ JOIN_RIGHT_SEMI, sjinfo,
+ restrictlist);
}
/*
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index fdb60aaa8d..476f286065 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1097,6 +1097,9 @@ build_join_pathkeys(PlannerInfo *root,
JoinType jointype,
List *outer_pathkeys)
{
+ /* RIGHT_SEMI should not come here */
+ Assert(jointype != JOIN_RIGHT_SEMI);
+
if (jointype == JOIN_FULL ||
jointype == JOIN_RIGHT ||
jointype == JOIN_RIGHT_ANTI)
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4c32682e4c..ccef7b0d0c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -317,6 +317,7 @@ typedef enum JoinType
*/
JOIN_SEMI, /* 1 copy of each LHS row that has match(es) */
JOIN_ANTI, /* 1 copy of each LHS row that has no match */
+ JOIN_RIGHT_SEMI, /* 1 copy of each RHS row that has match(es) */
JOIN_RIGHT_ANTI, /* 1 copy of each RHS row that has no match */
/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 446959e3c5..0cac52138e 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1909,18 +1909,18 @@ select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
and i4.f1 = a.tenthous;
- QUERY PLAN
-----------------------------------------------
- Hash Semi Join
- Hash Cond: (a.twothousand = b.twothousand)
+ QUERY PLAN
+-------------------------------------------------
+ Hash Right Semi Join
+ Hash Cond: (b.twothousand = a.twothousand)
Join Filter: (a.fivethous <> b.fivethous)
- -> Hash Join
- Hash Cond: (a.tenthous = i4.f1)
- -> Seq Scan on tenk1 a
- -> Hash
- -> Seq Scan on int4_tbl i4
+ -> Seq Scan on tenk1 b
-> Hash
- -> Seq Scan on tenk1 b
+ -> Hash Join
+ Hash Cond: (a.tenthous = i4.f1)
+ -> Seq Scan on tenk1 a
+ -> Hash
+ -> Seq Scan on int4_tbl i4
(10 rows)
--
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 6560fe2416..26da5692cd 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2375,24 +2375,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2584,24 +2584,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2868,25 +2868,25 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_adv t1 INNER JOIN prt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: (t1.a = t2.b)
+ -> Hash Right Semi Join
+ Hash Cond: (t2.b = t1.a)
-> Append
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt2_adv_p1 t2_1
+ -> Seq Scan on prt2_adv_p2 t2_2
+ -> Seq Scan on prt2_adv_p3_1 t2_3
+ -> Seq Scan on prt2_adv_p3_2 t2_4
-> Hash
-> Append
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Seq Scan on prt2_adv_p3_1 t2_3
- -> Seq Scan on prt2_adv_p3_2 t2_4
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(17 rows)
-- left join
@@ -3265,27 +3265,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3455,27 +3458,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3671,25 +3677,25 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: ((t1.a = t2.a) AND (t1.c = t2.c))
+ -> Hash Right Semi Join
+ Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Seq Scan on plt2_adv_p1 t2_1
+ -> Seq Scan on plt2_adv_p2_1 t2_2
+ -> Seq Scan on plt2_adv_p2_2 t2_3
+ -> Seq Scan on plt2_adv_p3 t2_4
-> Hash
-> Append
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Seq Scan on plt2_adv_p2_1 t2_2
- -> Seq Scan on plt2_adv_p2_2 t2_3
- -> Seq Scan on plt2_adv_p3 t2_4
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
(17 rows)
-- left join
@@ -3819,28 +3825,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1_null t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
+ -> Seq Scan on plt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p1_null t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3_null t2_3
-(19 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index d88353d496..d524989f25 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -999,26 +999,26 @@ reset role;
explain (costs off, verbose)
select count(*) from tenk1 a where (unique1, two) in
(select unique1, row_number() over() from tenk1 b);
- QUERY PLAN
-----------------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------
Aggregate
Output: count(*)
- -> Hash Semi Join
- Hash Cond: ((a.unique1 = b.unique1) AND (a.two = (row_number() OVER (?))))
- -> Gather
+ -> Hash Right Semi Join
+ Hash Cond: ((b.unique1 = a.unique1) AND ((row_number() OVER (?)) = a.two))
+ -> WindowAgg
+ Output: b.unique1, row_number() OVER (?)
+ -> Gather
+ Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
+ Output: b.unique1
+ -> Hash
Output: a.unique1, a.two
- Workers Planned: 4
- -> Parallel Seq Scan on public.tenk1 a
+ -> Gather
Output: a.unique1, a.two
- -> Hash
- Output: b.unique1, (row_number() OVER (?))
- -> WindowAgg
- Output: b.unique1, row_number() OVER (?)
- -> Gather
- Output: b.unique1
- Workers Planned: 4
- -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
- Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Seq Scan on public.tenk1 a
+ Output: a.unique1, a.two
(18 rows)
-- LIMIT/OFFSET within sub-selects can't be pushed to workers.
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index a73c1f90c4..c9951687c2 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2672,10 +2672,10 @@ NOTICE: snooped value: 8
SELECT * FROM v1 WHERE b=8;
a | b | c | d
---+---+------+------
- 9 | 8 | t1 | t11d
- 9 | 8 | t11 | t11d
- 9 | 8 | t12 | t11d
9 | 8 | t111 | t11d
+ 9 | 8 | t12 | t11d
+ 9 | 8 | t11 | t11d
+ 9 | 8 | t1 | t11d
(4 rows)
DELETE FROM v1 WHERE snoop(a) AND leakproof(a); -- should not delete everything, just where a>5
--
2.31.0
On Wed, 1 Nov 2023 at 11:25, Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Aug 10, 2023 at 3:24 PM Richard Guo <guofenglinux@gmail.com> wrote:
The cfbot reminds that this patch does not apply any more, so rebase it
to v2.Attached is another rebase over the latest master. Any feedback is
appreciated.
One of the tests in CFBot has failed at [1] with:
- Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
- Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6,
r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS
(SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY
r1."C 1" ASC NULLS LAST
-(4 rows)
+ Sort Key: t1.c1
+ -> Foreign Scan
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
+ Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5,
r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND
EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3)))
+(7 rows)
More details are available at [2]https://api.cirrus-ci.com/v1/artifact/task/4868751326183424/testrun/build/testrun/postgres_fdw/regress/regression.diffs.
[1]: https://cirrus-ci.com/task/4868751326183424
[2]: https://api.cirrus-ci.com/v1/artifact/task/4868751326183424/testrun/build/testrun/postgres_fdw/regress/regression.diffs
Regards,
Vignesh
On Sun, Jan 7, 2024 at 3:03 PM vignesh C <vignesh21@gmail.com> wrote:
One of the tests in CFBot has failed at [1] with: - Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2) - Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY r1."C 1" ASC NULLS LAST -(4 rows) + Sort Key: t1.c1 + -> Foreign Scan + Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 + Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2) + Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) +(7 rows)
Thanks. I looked into it and have figured out why the plan differs.
With this patch the SEMI JOIN that is pushed down to the remote server
is now implemented using JOIN_RIGHT_SEMI, whereas previously it was
implemented using JOIN_SEMI. Consequently, this leads to changes in the
costs of the paths: path with the sort pushed down to remote server, and
path with the sort added atop the foreign join. And at last the latter
one wins by a slim margin.
I think we can simply update the expected file to fix this plan diff, as
attached.
Thanks
Richard
Attachments:
v4-0001-Support-Right-Semi-Join-plan-shapes.patchapplication/octet-stream; name=v4-0001-Support-Right-Semi-Join-plan-shapes.patchDownload
From 970294c816e49fda76c7e9251e576e9dd64311cd Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 18 Apr 2023 16:25:36 +0800
Subject: [PATCH v4] Support "Right Semi Join" plan shapes
---
.../postgres_fdw/expected/postgres_fdw.out | 15 +-
src/backend/commands/explain.c | 3 +
src/backend/executor/nodeHashjoin.c | 14 +-
src/backend/optimizer/path/joinpath.c | 45 ++--
src/backend/optimizer/path/joinrels.c | 3 +
src/backend/optimizer/path/pathkeys.c | 3 +
src/include/nodes/nodes.h | 1 +
src/test/regress/expected/join.out | 20 +-
src/test/regress/expected/partition_join.out | 216 +++++++++---------
src/test/regress/expected/select_parallel.out | 32 +--
src/test/regress/expected/updatable_views.out | 6 +-
11 files changed, 203 insertions(+), 155 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index d83f6ae8cb..28529ef0fd 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -4049,13 +4049,16 @@ RESET enable_sort;
-- subquery using immutable function (can be sent to remote)
PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st3(10, 20);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
- Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY r1."C 1" ASC NULLS LAST
-(4 rows)
+ Sort Key: t1.c1
+ -> Foreign Scan
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
+ Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3)))
+(7 rows)
EXECUTE st3(10, 20);
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 3d590a6b9f..38ec2430d4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1569,6 +1569,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
case JOIN_ANTI:
jointype = "Anti";
break;
+ case JOIN_RIGHT_SEMI:
+ jointype = "Right Semi";
+ break;
case JOIN_RIGHT_ANTI:
jointype = "Right Anti";
break;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index 1cbec4647c..a638070302 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -534,6 +534,14 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
}
}
+ /*
+ * In a right-semijoin, we only need the first match for each
+ * inner tuple.
+ */
+ if (node->js.jointype == JOIN_RIGHT_SEMI &&
+ HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
+ continue;
+
/*
* We've got a match, but still need to test non-hashed quals.
* ExecScanHashBucket already set up all the state needed to
@@ -552,8 +560,9 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
/*
- * This is really only needed if HJ_FILL_INNER(node), but
- * we'll avoid the branch and just set it always.
+ * This is really only needed if HJ_FILL_INNER(node) or
+ * it's a right-semijoin, but we'll avoid the branch and
+ * just set it always.
*/
if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
@@ -780,6 +789,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
{
case JOIN_INNER:
case JOIN_SEMI:
+ case JOIN_RIGHT_SEMI:
break;
case JOIN_LEFT:
case JOIN_ANTI:
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 3fd5a24fad..b4b4d508a7 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -287,8 +287,8 @@ add_paths_to_joinrel(PlannerInfo *root,
* sorted. This includes both nestloops and mergejoins where the outer
* path is already ordered. Again, skip this if we can't mergejoin.
* (That's okay because we know that nestloop can't handle
- * right/right-anti/full joins at all, so it wouldn't work in the
- * prohibited cases either.)
+ * right/right-anti/right-semi/full joins at all, so it wouldn't work in
+ * the prohibited cases either.)
*/
if (mergejoin_allowed)
match_unsorted_outer(root, joinrel, outerrel, innerrel,
@@ -1720,6 +1720,13 @@ match_unsorted_outer(PlannerInfo *root,
Path *matpath = NULL;
ListCell *lc1;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
+ * join.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ return;
+
/*
* Nestloop only supports inner, left, semi, and anti joins. Also, if we
* are doing a right, right-anti or full mergejoin, we must use *all* the
@@ -2289,13 +2296,14 @@ hash_inner_and_outer(PlannerInfo *root,
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
* path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, or right-anti join, we can't use
- * parallelism (building the hash table in each backend) because
- * no one process has all the match bits.
+ * inner path. If full, right, right-anti or right-semi join, we
+ * can't use parallelism (building the hash table in each backend)
+ * because no one process has all the match bits.
*/
if (save_jointype == JOIN_FULL ||
save_jointype == JOIN_RIGHT ||
- save_jointype == JOIN_RIGHT_ANTI)
+ save_jointype == JOIN_RIGHT_ANTI ||
+ save_jointype == JOIN_RIGHT_SEMI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
@@ -2318,14 +2326,14 @@ hash_inner_and_outer(PlannerInfo *root,
* Select mergejoin clauses that are usable for a particular join.
* Returns a list of RestrictInfo nodes for those clauses.
*
- * *mergejoin_allowed is normally set to true, but it is set to false if
- * this is a right/right-anti/full join and there are nonmergejoinable join
- * clauses. The executor's mergejoin machinery cannot handle such cases, so
- * we have to avoid generating a mergejoin plan. (Note that this flag does
- * NOT consider whether there are actually any mergejoinable clauses. This is
- * correct because in some cases we need to build a clauseless mergejoin.
- * Simply returning NIL is therefore not enough to distinguish safe from
- * unsafe cases.)
+ * *mergejoin_allowed is normally set to true, but it is set to false if this
+ * is a right-semi join, or this is a right/right-anti/full join and there are
+ * nonmergejoinable join clauses. The executor's mergejoin machinery cannot
+ * handle such cases, so we have to avoid generating a mergejoin plan. (Note
+ * that this flag does NOT consider whether there are actually any
+ * mergejoinable clauses. This is correct because in some cases we need to
+ * build a clauseless mergejoin. Simply returning NIL is therefore not enough
+ * to distinguish safe from unsafe cases.)
*
* We also mark each selected RestrictInfo to show which side is currently
* being considered as outer. These are transient markings that are only
@@ -2349,6 +2357,15 @@ select_mergejoin_clauses(PlannerInfo *root,
bool have_nonmergeable_joinclause = false;
ListCell *l;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
foreach(l, restrictlist)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4750579b0a..366c55dd16 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -973,6 +973,9 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
add_paths_to_joinrel(root, joinrel, rel1, rel2,
JOIN_SEMI, sjinfo,
restrictlist);
+ add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ JOIN_RIGHT_SEMI, sjinfo,
+ restrictlist);
}
/*
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ca94a31f71..814c6fec6e 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1097,6 +1097,9 @@ build_join_pathkeys(PlannerInfo *root,
JoinType jointype,
List *outer_pathkeys)
{
+ /* RIGHT_SEMI should not come here */
+ Assert(jointype != JOIN_RIGHT_SEMI);
+
if (jointype == JOIN_FULL ||
jointype == JOIN_RIGHT ||
jointype == JOIN_RIGHT_ANTI)
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 2969dd831b..4daef576ab 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -296,6 +296,7 @@ typedef enum JoinType
*/
JOIN_SEMI, /* 1 copy of each LHS row that has match(es) */
JOIN_ANTI, /* 1 copy of each LHS row that has no match */
+ JOIN_RIGHT_SEMI, /* 1 copy of each RHS row that has match(es) */
JOIN_RIGHT_ANTI, /* 1 copy of each RHS row that has no match */
/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 321b6a7147..738e571e07 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1909,18 +1909,18 @@ select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
and i4.f1 = a.tenthous;
- QUERY PLAN
-----------------------------------------------
- Hash Semi Join
- Hash Cond: (a.twothousand = b.twothousand)
+ QUERY PLAN
+-------------------------------------------------
+ Hash Right Semi Join
+ Hash Cond: (b.twothousand = a.twothousand)
Join Filter: (a.fivethous <> b.fivethous)
- -> Hash Join
- Hash Cond: (a.tenthous = i4.f1)
- -> Seq Scan on tenk1 a
- -> Hash
- -> Seq Scan on int4_tbl i4
+ -> Seq Scan on tenk1 b
-> Hash
- -> Seq Scan on tenk1 b
+ -> Hash Join
+ Hash Cond: (a.tenthous = i4.f1)
+ -> Seq Scan on tenk1 a
+ -> Hash
+ -> Seq Scan on int4_tbl i4
(10 rows)
--
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 6560fe2416..26da5692cd 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2375,24 +2375,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2584,24 +2584,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2868,25 +2868,25 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_adv t1 INNER JOIN prt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: (t1.a = t2.b)
+ -> Hash Right Semi Join
+ Hash Cond: (t2.b = t1.a)
-> Append
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt2_adv_p1 t2_1
+ -> Seq Scan on prt2_adv_p2 t2_2
+ -> Seq Scan on prt2_adv_p3_1 t2_3
+ -> Seq Scan on prt2_adv_p3_2 t2_4
-> Hash
-> Append
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Seq Scan on prt2_adv_p3_1 t2_3
- -> Seq Scan on prt2_adv_p3_2 t2_4
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(17 rows)
-- left join
@@ -3265,27 +3265,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3455,27 +3458,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3671,25 +3677,25 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: ((t1.a = t2.a) AND (t1.c = t2.c))
+ -> Hash Right Semi Join
+ Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Seq Scan on plt2_adv_p1 t2_1
+ -> Seq Scan on plt2_adv_p2_1 t2_2
+ -> Seq Scan on plt2_adv_p2_2 t2_3
+ -> Seq Scan on plt2_adv_p3 t2_4
-> Hash
-> Append
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Seq Scan on plt2_adv_p2_1 t2_2
- -> Seq Scan on plt2_adv_p2_2 t2_3
- -> Seq Scan on plt2_adv_p3 t2_4
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
(17 rows)
-- left join
@@ -3819,28 +3825,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1_null t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
+ -> Seq Scan on plt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p1_null t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3_null t2_3
-(19 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index d88353d496..d524989f25 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -999,26 +999,26 @@ reset role;
explain (costs off, verbose)
select count(*) from tenk1 a where (unique1, two) in
(select unique1, row_number() over() from tenk1 b);
- QUERY PLAN
-----------------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------
Aggregate
Output: count(*)
- -> Hash Semi Join
- Hash Cond: ((a.unique1 = b.unique1) AND (a.two = (row_number() OVER (?))))
- -> Gather
+ -> Hash Right Semi Join
+ Hash Cond: ((b.unique1 = a.unique1) AND ((row_number() OVER (?)) = a.two))
+ -> WindowAgg
+ Output: b.unique1, row_number() OVER (?)
+ -> Gather
+ Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
+ Output: b.unique1
+ -> Hash
Output: a.unique1, a.two
- Workers Planned: 4
- -> Parallel Seq Scan on public.tenk1 a
+ -> Gather
Output: a.unique1, a.two
- -> Hash
- Output: b.unique1, (row_number() OVER (?))
- -> WindowAgg
- Output: b.unique1, row_number() OVER (?)
- -> Gather
- Output: b.unique1
- Workers Planned: 4
- -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
- Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Seq Scan on public.tenk1 a
+ Output: a.unique1, a.two
(18 rows)
-- LIMIT/OFFSET within sub-selects can't be pushed to workers.
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1950e6f281..4172468aec 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2675,10 +2675,10 @@ NOTICE: snooped value: 8
SELECT * FROM v1 WHERE b=8;
a | b | c | d
---+---+------+------
- 9 | 8 | t1 | t11d
- 9 | 8 | t11 | t11d
- 9 | 8 | t12 | t11d
9 | 8 | t111 | t11d
+ 9 | 8 | t12 | t11d
+ 9 | 8 | t11 | t11d
+ 9 | 8 | t1 | t11d
(4 rows)
DELETE FROM v1 WHERE snoop(a) AND leakproof(a); -- should not delete everything, just where a>5
--
2.31.0
Hi vignesh C I saw this path has been passed (
https://cirrus-ci.com/build/6109321080078336),can we push it?
Best wish
Richard Guo <guofenglinux@gmail.com> 于2024年1月9日周二 18:49写道:
Show quoted text
On Sun, Jan 7, 2024 at 3:03 PM vignesh C <vignesh21@gmail.com> wrote:
One of the tests in CFBot has failed at [1] with: - Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2) - Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY r1."C 1" ASC NULLS LAST -(4 rows) + Sort Key: t1.c1 + -> Foreign Scan + Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 + Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2) + Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) +(7 rows)Thanks. I looked into it and have figured out why the plan differs.
With this patch the SEMI JOIN that is pushed down to the remote server
is now implemented using JOIN_RIGHT_SEMI, whereas previously it was
implemented using JOIN_SEMI. Consequently, this leads to changes in the
costs of the paths: path with the sort pushed down to remote server, and
path with the sort added atop the foreign join. And at last the latter
one wins by a slim margin.I think we can simply update the expected file to fix this plan diff, as
attached.Thanks
Richard
On Mon, 22 Jan 2024 at 11:27, wenhui qiu <qiuwenhuifx@gmail.com> wrote:
Hi vignesh C I saw this path has been passed (https://cirrus-ci.com/build/6109321080078336),can we push it?
If you have found no comments from your review and testing, let's mark
it as "ready for committer".
Regards,
Vignesh
Hi vignesh C
Many thanks, I have marked it to "ready for committer"
Best wish
vignesh C <vignesh21@gmail.com> 于2024年1月23日周二 10:56写道:
Show quoted text
On Mon, 22 Jan 2024 at 11:27, wenhui qiu <qiuwenhuifx@gmail.com> wrote:
Hi vignesh C I saw this path has been passed (
https://cirrus-ci.com/build/6109321080078336),can we push it?
If you have found no comments from your review and testing, let's mark
it as "ready for committer".Regards,
Vignesh
Hi! Thank you for your work on this subject.
I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).
Mostly I'm suggesting this because of the set_join_pathlist_hook
function, which is in the add_paths_to_joinrel function, which allows
you to create a custom node. What do you think?
--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi Alena Rybakina
I saw this code snippet also disable mergejoin ,I think it same effect
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
Regards
Alena Rybakina <lena.ribackina@yandex.ru> 于2024年1月30日周二 14:51写道:
Show quoted text
Hi! Thank you for your work on this subject.
I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).
Mostly I'm suggesting this because of the set_join_pathlist_hook
function, which is in the add_paths_to_joinrel function, which allows
you to create a custom node. What do you think?--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
HI Richard
Now it is starting the last commitfest for v17, can you respond to
Alena Rybakina points?
Regards
On Thu, 8 Feb 2024 at 13:50, wenhui qiu <qiuwenhuifx@gmail.com> wrote:
Show quoted text
Hi Alena Rybakina I saw this code snippet also disable mergejoin ,I think it same effect + /* + * For now we do not support RIGHT_SEMI join in mergejoin. + */ + if (jointype == JOIN_RIGHT_SEMI) + { + *mergejoin_allowed = false; + return NIL; + } +Regards
Alena Rybakina <lena.ribackina@yandex.ru> 于2024年1月30日周二 14:51写道:
Hi! Thank you for your work on this subject.
I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).
Mostly I'm suggesting this because of the set_join_pathlist_hook
function, which is in the add_paths_to_joinrel function, which allows
you to create a custom node. What do you think?--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Mon, Mar 4, 2024 at 10:33 AM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
HI Richard
Now it is starting the last commitfest for v17, can you respond to
Alena Rybakina points?
Thanks for reminding. Will do that soon.
Thanks
Richard
On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina <lena.ribackina@yandex.ru>
wrote:
I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).
Hmm, I don't see why this is necessary. The planner should already
guarantee that we won't have nestloops/mergejoins with right-semi joins.
Thanks
Richard
Hi Richard
Agree +1 ,I think can push now.
Richard
On Tue, 5 Mar 2024 at 10:44, Richard Guo <guofenglinux@gmail.com> wrote:
Show quoted text
On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina <lena.ribackina@yandex.ru>
wrote:I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).Hmm, I don't see why this is necessary. The planner should already
guarantee that we won't have nestloops/mergejoins with right-semi joins.Thanks
Richard
To be honest, I didn't see it in the code, could you tell me where they
are, please?
On 05.03.2024 05:44, Richard Guo wrote:
On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina
<lena.ribackina@yandex.ru> wrote:I have reviewed your patch and I think it is better to add an
Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections
(NestedLoop
and MergeJoin).Hmm, I don't see why this is necessary. The planner should already
guarantee that we won't have nestloops/mergejoins with right-semi joins.Thanks
Richard
--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
Hi Alena Rybakina
For merge join
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
Tanks
On Wed, 6 Mar 2024 at 04:10, Alena Rybakina <lena.ribackina@yandex.ru>
wrote:
Show quoted text
To be honest, I didn't see it in the code, could you tell me where they
are, please?
On 05.03.2024 05:44, Richard Guo wrote:On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina <lena.ribackina@yandex.ru>
wrote:I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).Hmm, I don't see why this is necessary. The planner should already
guarantee that we won't have nestloops/mergejoins with right-semi joins.Thanks
Richard--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 06.03.2024 05:23, wenhui qiu wrote:
Hi Alena Rybakina For merge join + /* + * For now we do not support RIGHT_SEMI join in mergejoin. + */ + if (jointype == JOIN_RIGHT_SEMI) + { + *mergejoin_allowed = false; + return NIL; + } + Tanks
Yes, I see it, thank you. Sorry for the noise.
--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
Here is another rebase with a commit message to help review. I also
tweaked some comments.
Thanks
Richard
Attachments:
v5-0001-Support-Right-Semi-Join-plan-shapes.patchapplication/octet-stream; name=v5-0001-Support-Right-Semi-Join-plan-shapes.patchDownload
From 4322c793da64fb6b244d9debd0fea1aa54d3116d Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 25 Apr 2024 02:51:42 +0000
Subject: [PATCH v5] Support "Right Semi Join" plan shapes
Hash joins can support semijoin with the LHS input on the right, using
the existing logic for inner join, combined with the assurance that only
the first match for each inner tuple is considered, which can be
achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very
useful in some cases since we may now have the option to hash the
smaller table instead of the larger.
---
.../postgres_fdw/expected/postgres_fdw.out | 15 +-
src/backend/commands/explain.c | 3 +
src/backend/executor/nodeHashjoin.c | 14 +-
src/backend/optimizer/path/joinpath.c | 43 ++--
src/backend/optimizer/path/joinrels.c | 3 +
src/backend/optimizer/path/pathkeys.c | 3 +
src/include/nodes/nodes.h | 9 +-
src/test/regress/expected/join.out | 20 +-
src/test/regress/expected/partition_join.out | 216 +++++++++---------
src/test/regress/expected/select_parallel.out | 32 +--
src/test/regress/expected/updatable_views.out | 6 +-
11 files changed, 206 insertions(+), 158 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 078b8a966f..38a1d97b30 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -4106,13 +4106,16 @@ RESET enable_sort;
-- subquery using immutable function (can be sent to remote)
PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st3(10, 20);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
- Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY r1."C 1" ASC NULLS LAST
-(4 rows)
+ Sort Key: t1.c1
+ -> Foreign Scan
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
+ Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3)))
+(7 rows)
EXECUTE st3(10, 20);
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c0c73aa3c9..0bc4fa5f1a 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1743,6 +1743,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
case JOIN_ANTI:
jointype = "Anti";
break;
+ case JOIN_RIGHT_SEMI:
+ jointype = "Right Semi";
+ break;
case JOIN_RIGHT_ANTI:
jointype = "Right Anti";
break;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index dbf114cd5e..efff8649f8 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -533,6 +533,14 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
}
}
+ /*
+ * In a right-semijoin, we only need the first match for each
+ * inner tuple.
+ */
+ if (node->js.jointype == JOIN_RIGHT_SEMI &&
+ HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
+ continue;
+
/*
* We've got a match, but still need to test non-hashed quals.
* ExecScanHashBucket already set up all the state needed to
@@ -551,8 +559,9 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
/*
- * This is really only needed if HJ_FILL_INNER(node), but
- * we'll avoid the branch and just set it always.
+ * This is really only needed if HJ_FILL_INNER(node) or if
+ * we are in a right-semijoin, but we'll avoid the branch
+ * and just set it always.
*/
if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
@@ -779,6 +788,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
{
case JOIN_INNER:
case JOIN_SEMI:
+ case JOIN_RIGHT_SEMI:
break;
case JOIN_LEFT:
case JOIN_ANTI:
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5be8da9e09..aed8535377 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -288,8 +288,8 @@ add_paths_to_joinrel(PlannerInfo *root,
* sorted. This includes both nestloops and mergejoins where the outer
* path is already ordered. Again, skip this if we can't mergejoin.
* (That's okay because we know that nestloop can't handle
- * right/right-anti/full joins at all, so it wouldn't work in the
- * prohibited cases either.)
+ * right/right-anti/right-semi/full joins at all, so it wouldn't work in
+ * the prohibited cases either.)
*/
if (mergejoin_allowed)
match_unsorted_outer(root, joinrel, outerrel, innerrel,
@@ -1728,6 +1728,13 @@ match_unsorted_outer(PlannerInfo *root,
Path *matpath = NULL;
ListCell *lc1;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
+ * join.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ return;
+
/*
* Nestloop only supports inner, left, semi, and anti joins. Also, if we
* are doing a right, right-anti or full mergejoin, we must use *all* the
@@ -2297,13 +2304,14 @@ hash_inner_and_outer(PlannerInfo *root,
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
* path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, or right-anti join, we can't use
- * parallelism (building the hash table in each backend) because
- * no one process has all the match bits.
+ * inner path. If full, right, right-anti or right-semi join, we
+ * can't use parallelism (building the hash table in each backend)
+ * because no one process has all the match bits.
*/
if (save_jointype == JOIN_FULL ||
save_jointype == JOIN_RIGHT ||
- save_jointype == JOIN_RIGHT_ANTI)
+ save_jointype == JOIN_RIGHT_ANTI ||
+ save_jointype == JOIN_RIGHT_SEMI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
@@ -2327,13 +2335,13 @@ hash_inner_and_outer(PlannerInfo *root,
* Returns a list of RestrictInfo nodes for those clauses.
*
* *mergejoin_allowed is normally set to true, but it is set to false if
- * this is a right/right-anti/full join and there are nonmergejoinable join
- * clauses. The executor's mergejoin machinery cannot handle such cases, so
- * we have to avoid generating a mergejoin plan. (Note that this flag does
- * NOT consider whether there are actually any mergejoinable clauses. This is
- * correct because in some cases we need to build a clauseless mergejoin.
- * Simply returning NIL is therefore not enough to distinguish safe from
- * unsafe cases.)
+ * this is a right-semi join, or this is a right/right-anti/full join and
+ * there are nonmergejoinable join clauses. The executor's mergejoin
+ * machinery cannot handle such cases, so we have to avoid generating a
+ * mergejoin plan. (Note that this flag does NOT consider whether there are
+ * actually any mergejoinable clauses. This is correct because in some
+ * cases we need to build a clauseless mergejoin. Simply returning NIL is
+ * therefore not enough to distinguish safe from unsafe cases.)
*
* We also mark each selected RestrictInfo to show which side is currently
* being considered as outer. These are transient markings that are only
@@ -2357,6 +2365,15 @@ select_mergejoin_clauses(PlannerInfo *root,
bool have_nonmergeable_joinclause = false;
ListCell *l;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
foreach(l, restrictlist)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index f3a9412d18..44be612ec2 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -991,6 +991,9 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
add_paths_to_joinrel(root, joinrel, rel1, rel2,
JOIN_SEMI, sjinfo,
restrictlist);
+ add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ JOIN_RIGHT_SEMI, sjinfo,
+ restrictlist);
}
/*
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 8b258cbef9..80cfe89f8c 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1311,6 +1311,9 @@ build_join_pathkeys(PlannerInfo *root,
JoinType jointype,
List *outer_pathkeys)
{
+ /* RIGHT_SEMI should not come here */
+ Assert(jointype != JOIN_RIGHT_SEMI);
+
if (jointype == JOIN_FULL ||
jointype == JOIN_RIGHT ||
jointype == JOIN_RIGHT_ANTI)
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 855009fd6e..0d71d821f7 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -306,6 +306,7 @@ typedef enum JoinType
*/
JOIN_SEMI, /* 1 copy of each LHS row that has match(es) */
JOIN_ANTI, /* 1 copy of each LHS row that has no match */
+ JOIN_RIGHT_SEMI, /* 1 copy of each RHS row that has match(es) */
JOIN_RIGHT_ANTI, /* 1 copy of each RHS row that has no match */
/*
@@ -322,10 +323,10 @@ typedef enum JoinType
/*
* OUTER joins are those for which pushed-down quals must behave differently
- * from the join's own quals. This is in fact everything except INNER and
- * SEMI joins. However, this macro must also exclude the JOIN_UNIQUE symbols
- * since those are temporary proxies for what will eventually be an INNER
- * join.
+ * from the join's own quals. This is in fact everything except INNER, SEMI
+ * and RIGHT_SEMI joins. However, this macro must also exclude the
+ * JOIN_UNIQUE symbols since those are temporary proxies for what will
+ * eventually be an INNER join.
*
* Note: semijoins are a hybrid case, but we choose to treat them as not
* being outer joins. This is okay principally because the SQL syntax makes
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8b640c2fc2..6050860b2e 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1909,18 +1909,18 @@ select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
and i4.f1 = a.tenthous;
- QUERY PLAN
-----------------------------------------------
- Hash Semi Join
- Hash Cond: (a.twothousand = b.twothousand)
+ QUERY PLAN
+-------------------------------------------------
+ Hash Right Semi Join
+ Hash Cond: (b.twothousand = a.twothousand)
Join Filter: (a.fivethous <> b.fivethous)
- -> Hash Join
- Hash Cond: (a.tenthous = i4.f1)
- -> Seq Scan on tenk1 a
- -> Hash
- -> Seq Scan on int4_tbl i4
+ -> Seq Scan on tenk1 b
-> Hash
- -> Seq Scan on tenk1 b
+ -> Hash Join
+ Hash Cond: (a.tenthous = i4.f1)
+ -> Seq Scan on tenk1 a
+ -> Hash
+ -> Seq Scan on int4_tbl i4
(10 rows)
--
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 6d07f86b9b..53591a4f2d 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2543,24 +2543,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2752,24 +2752,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -3036,25 +3036,25 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_adv t1 INNER JOIN prt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: (t1.a = t2.b)
+ -> Hash Right Semi Join
+ Hash Cond: (t2.b = t1.a)
-> Append
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt2_adv_p1 t2_1
+ -> Seq Scan on prt2_adv_p2 t2_2
+ -> Seq Scan on prt2_adv_p3_1 t2_3
+ -> Seq Scan on prt2_adv_p3_2 t2_4
-> Hash
-> Append
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Seq Scan on prt2_adv_p3_1 t2_3
- -> Seq Scan on prt2_adv_p3_2 t2_4
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(17 rows)
-- left join
@@ -3433,27 +3433,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3623,27 +3626,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3839,25 +3845,25 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: ((t1.a = t2.a) AND (t1.c = t2.c))
+ -> Hash Right Semi Join
+ Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Seq Scan on plt2_adv_p1 t2_1
+ -> Seq Scan on plt2_adv_p2_1 t2_2
+ -> Seq Scan on plt2_adv_p2_2 t2_3
+ -> Seq Scan on plt2_adv_p3 t2_4
-> Hash
-> Append
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Seq Scan on plt2_adv_p2_1 t2_2
- -> Seq Scan on plt2_adv_p2_2 t2_3
- -> Seq Scan on plt2_adv_p3 t2_4
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
(17 rows)
-- left join
@@ -3987,28 +3993,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1_null t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
+ -> Seq Scan on plt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p1_null t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3_null t2_3
-(19 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 87273fa635..c96285d1bb 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -1076,26 +1076,26 @@ reset role;
explain (costs off, verbose)
select count(*) from tenk1 a where (unique1, two) in
(select unique1, row_number() over() from tenk1 b);
- QUERY PLAN
-----------------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------
Aggregate
Output: count(*)
- -> Hash Semi Join
- Hash Cond: ((a.unique1 = b.unique1) AND (a.two = (row_number() OVER (?))))
- -> Gather
+ -> Hash Right Semi Join
+ Hash Cond: ((b.unique1 = a.unique1) AND ((row_number() OVER (?)) = a.two))
+ -> WindowAgg
+ Output: b.unique1, row_number() OVER (?)
+ -> Gather
+ Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
+ Output: b.unique1
+ -> Hash
Output: a.unique1, a.two
- Workers Planned: 4
- -> Parallel Seq Scan on public.tenk1 a
+ -> Gather
Output: a.unique1, a.two
- -> Hash
- Output: b.unique1, (row_number() OVER (?))
- -> WindowAgg
- Output: b.unique1, row_number() OVER (?)
- -> Gather
- Output: b.unique1
- Workers Planned: 4
- -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
- Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Seq Scan on public.tenk1 a
+ Output: a.unique1, a.two
(18 rows)
-- LIMIT/OFFSET within sub-selects can't be pushed to workers.
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1d1f568bc4..9c21b76800 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -3287,10 +3287,10 @@ NOTICE: snooped value: 8
SELECT * FROM v1 WHERE b=8;
a | b | c | d
---+---+------+------
- 9 | 8 | t1 | t11d
- 9 | 8 | t11 | t11d
- 9 | 8 | t12 | t11d
9 | 8 | t111 | t11d
+ 9 | 8 | t12 | t11d
+ 9 | 8 | t11 | t11d
+ 9 | 8 | t1 | t11d
(4 rows)
DELETE FROM v1 WHERE snoop(a) AND leakproof(a); -- should not delete everything, just where a>5
--
2.34.1
Hi Richard
Thank you so much for your tireless work on this,I see the new version
of the patch improves some of the comments .I think it can commit in July
Thanks
On Thu, 25 Apr 2024 at 11:28, Richard Guo <guofenglinux@gmail.com> wrote:
Show quoted text
Here is another rebase with a commit message to help review. I also
tweaked some comments.Thanks
Richard
Hi, Richard
On Apr 25, 2024, at 11:28, Richard Guo <guofenglinux@gmail.com> wrote:
Here is another rebase with a commit message to help review. I also
tweaked some comments.
Thank you for updating the patch, here are some comments on the v5 patch.
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
+ * join.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ return;
+
How about adding some reasons here?
+ * this is a right-semi join, or this is a right/right-anti/full join and
+ * there are nonmergejoinable join clauses. The executor's mergejoin
Maybe we can put the right-semi join together with the right/right-anti/full
join. Is there any other significance by putting it separately?
Maybe the following comments also should be updated. Right?
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 5482ab85a7..791cbc551e 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -455,8 +455,8 @@ pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
* point of the available_rels machinations is to ensure that we only
* pull up quals for which that's okay.
*
- * We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI, or
- * JOIN_RIGHT_ANTI jointypes here.
+ * We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI,
+ * JOIN_RIGHT_SEMI, or JOIN_RIGHT_ANTI jointypes here.
*/
switch (j->jointype)
{
@@ -2951,6 +2951,7 @@ reduce_outer_joins_pass2(Node *jtnode,
* so there's no way that upper quals could refer to their
* righthand sides, and no point in checking. We don't expect
* to see JOIN_RIGHT_ANTI yet.
+ * Does JOIN_RIGHT_SEMI is expected here?
*/
break;
default:
Thank you for reviewing.
On Mon, Jun 24, 2024 at 1:27 PM Li Japin <japinli@hotmail.com> wrote:
+ /* + * For now we do not support RIGHT_SEMI join in mergejoin or nestloop + * join. + */ + if (jointype == JOIN_RIGHT_SEMI) + return; +How about adding some reasons here?
I've included a brief explanation in select_mergejoin_clauses.
+ * this is a right-semi join, or this is a right/right-anti/full join and + * there are nonmergejoinable join clauses. The executor's mergejoinMaybe we can put the right-semi join together with the right/right-anti/full
join. Is there any other significance by putting it separately?
I don't think so. The logic is different: for right-semi join we will
always set *mergejoin_allowed to false, while for right/right-anti/full
join it is set to false only if there are nonmergejoinable join clauses.
Maybe the following comments also should be updated. Right?
Correct. And there are a few more places where we need to mention
JOIN_RIGHT_SEMI, such as in reduce_outer_joins_pass2 and in the comment
for SpecialJoinInfo.
I noticed that this patch changes the plan of a query in join.sql from
a semi join to right semi join, compromising the original purpose of
this query, which was to test the fix for neqjoinsel's behavior for
semijoins (see commit 7ca25b7d).
--
-- semijoin selectivity for <>
--
explain (costs off)
select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
and i4.f1 = a.tenthous;
So I've changed this test case a bit so that it is still testing what it
is supposed to test.
In passing, I've also updated the commit message to clarify that this
patch does not address the support of "Right Semi Join" for merge joins.
Thanks
Richard
Attachments:
v6-0001-Support-Right-Semi-Join-plan-shapes.patchapplication/octet-stream; name=v6-0001-Support-Right-Semi-Join-plan-shapes.patchDownload
From d01dd46997f94984ad40d087cbbf9976710af1f4 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 6 Jun 2024 12:20:43 +0900
Subject: [PATCH v6] Support "Right Semi Join" plan shapes
Hash joins can support semijoin with the LHS input on the right, using
the existing logic for inner join, combined with the assurance that only
the first match for each inner tuple is considered, which can be
achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very
useful in some cases since we may now have the option to hash the
smaller table instead of the larger.
Merge join could likely support "Right Semi Join" too. However, the
benefit of swapping inputs tends to be small here, so we do not address
that in this patch.
---
.../postgres_fdw/expected/postgres_fdw.out | 15 +-
src/backend/commands/explain.c | 3 +
src/backend/executor/nodeHashjoin.c | 14 +-
src/backend/optimizer/path/joinpath.c | 42 +++-
src/backend/optimizer/path/joinrels.c | 3 +
src/backend/optimizer/path/pathkeys.c | 3 +
src/backend/optimizer/prep/prepjointree.c | 6 +-
src/include/nodes/nodes.h | 9 +-
src/include/nodes/pathnodes.h | 6 +-
src/test/regress/expected/join.out | 18 +-
src/test/regress/expected/partition_join.out | 216 +++++++++---------
src/test/regress/expected/select_parallel.out | 32 +--
src/test/regress/expected/updatable_views.out | 6 +-
src/test/regress/sql/join.sql | 8 +-
14 files changed, 215 insertions(+), 166 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index ea566d5034..1f22309194 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -4127,13 +4127,16 @@ RESET enable_sort;
-- subquery using immutable function (can be sent to remote)
PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st3(10, 20);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
- Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY r1."C 1" ASC NULLS LAST
-(4 rows)
+ Sort Key: t1.c1
+ -> Foreign Scan
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
+ Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3)))
+(7 rows)
EXECUTE st3(10, 20);
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 94511a5a02..30de9de9d4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1749,6 +1749,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
case JOIN_ANTI:
jointype = "Anti";
break;
+ case JOIN_RIGHT_SEMI:
+ jointype = "Right Semi";
+ break;
case JOIN_RIGHT_ANTI:
jointype = "Right Anti";
break;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index dbf114cd5e..efff8649f8 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -533,6 +533,14 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
}
}
+ /*
+ * In a right-semijoin, we only need the first match for each
+ * inner tuple.
+ */
+ if (node->js.jointype == JOIN_RIGHT_SEMI &&
+ HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
+ continue;
+
/*
* We've got a match, but still need to test non-hashed quals.
* ExecScanHashBucket already set up all the state needed to
@@ -551,8 +559,9 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
/*
- * This is really only needed if HJ_FILL_INNER(node), but
- * we'll avoid the branch and just set it always.
+ * This is really only needed if HJ_FILL_INNER(node) or if
+ * we are in a right-semijoin, but we'll avoid the branch
+ * and just set it always.
*/
if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
@@ -779,6 +788,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
{
case JOIN_INNER:
case JOIN_SEMI:
+ case JOIN_RIGHT_SEMI:
break;
case JOIN_LEFT:
case JOIN_ANTI:
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5be8da9e09..40eb58341c 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -288,8 +288,8 @@ add_paths_to_joinrel(PlannerInfo *root,
* sorted. This includes both nestloops and mergejoins where the outer
* path is already ordered. Again, skip this if we can't mergejoin.
* (That's okay because we know that nestloop can't handle
- * right/right-anti/full joins at all, so it wouldn't work in the
- * prohibited cases either.)
+ * right/right-anti/right-semi/full joins at all, so it wouldn't work in
+ * the prohibited cases either.)
*/
if (mergejoin_allowed)
match_unsorted_outer(root, joinrel, outerrel, innerrel,
@@ -1728,6 +1728,13 @@ match_unsorted_outer(PlannerInfo *root,
Path *matpath = NULL;
ListCell *lc1;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
+ * join.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ return;
+
/*
* Nestloop only supports inner, left, semi, and anti joins. Also, if we
* are doing a right, right-anti or full mergejoin, we must use *all* the
@@ -2297,12 +2304,13 @@ hash_inner_and_outer(PlannerInfo *root,
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
* path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, or right-anti join, we can't use
- * parallelism (building the hash table in each backend) because
- * no one process has all the match bits.
+ * inner path. If full, right, right-semi or right-anti join, we
+ * can't use parallelism (building the hash table in each backend)
+ * because no one process has all the match bits.
*/
if (save_jointype == JOIN_FULL ||
save_jointype == JOIN_RIGHT ||
+ save_jointype == JOIN_RIGHT_SEMI ||
save_jointype == JOIN_RIGHT_ANTI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
@@ -2327,13 +2335,13 @@ hash_inner_and_outer(PlannerInfo *root,
* Returns a list of RestrictInfo nodes for those clauses.
*
* *mergejoin_allowed is normally set to true, but it is set to false if
- * this is a right/right-anti/full join and there are nonmergejoinable join
- * clauses. The executor's mergejoin machinery cannot handle such cases, so
- * we have to avoid generating a mergejoin plan. (Note that this flag does
- * NOT consider whether there are actually any mergejoinable clauses. This is
- * correct because in some cases we need to build a clauseless mergejoin.
- * Simply returning NIL is therefore not enough to distinguish safe from
- * unsafe cases.)
+ * this is a right-semi join, or this is a right/right-anti/full join and
+ * there are nonmergejoinable join clauses. The executor's mergejoin
+ * machinery cannot handle such cases, so we have to avoid generating a
+ * mergejoin plan. (Note that this flag does NOT consider whether there are
+ * actually any mergejoinable clauses. This is correct because in some
+ * cases we need to build a clauseless mergejoin. Simply returning NIL is
+ * therefore not enough to distinguish safe from unsafe cases.)
*
* We also mark each selected RestrictInfo to show which side is currently
* being considered as outer. These are transient markings that are only
@@ -2357,6 +2365,16 @@ select_mergejoin_clauses(PlannerInfo *root,
bool have_nonmergeable_joinclause = false;
ListCell *l;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin: the benefit of
+ * swapping inputs tends to be small here.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
foreach(l, restrictlist)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index db475e25b1..a3677f824f 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -991,6 +991,9 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
add_paths_to_joinrel(root, joinrel, rel1, rel2,
JOIN_SEMI, sjinfo,
restrictlist);
+ add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ JOIN_RIGHT_SEMI, sjinfo,
+ restrictlist);
}
/*
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 416fc4e240..e25798972f 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1294,6 +1294,9 @@ build_join_pathkeys(PlannerInfo *root,
JoinType jointype,
List *outer_pathkeys)
{
+ /* RIGHT_SEMI should not come here */
+ Assert(jointype != JOIN_RIGHT_SEMI);
+
if (jointype == JOIN_FULL ||
jointype == JOIN_RIGHT ||
jointype == JOIN_RIGHT_ANTI)
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 5482ab85a7..969e257f70 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -455,8 +455,8 @@ pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
* point of the available_rels machinations is to ensure that we only
* pull up quals for which that's okay.
*
- * We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI, or
- * JOIN_RIGHT_ANTI jointypes here.
+ * We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI,
+ * JOIN_RIGHT_SEMI, or JOIN_RIGHT_ANTI jointypes here.
*/
switch (j->jointype)
{
@@ -2950,7 +2950,7 @@ reduce_outer_joins_pass2(Node *jtnode,
* These could only have been introduced by pull_up_sublinks,
* so there's no way that upper quals could refer to their
* righthand sides, and no point in checking. We don't expect
- * to see JOIN_RIGHT_ANTI yet.
+ * to see JOIN_RIGHT_SEMI or JOIN_RIGHT_ANTI yet.
*/
break;
default:
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 855009fd6e..0d71d821f7 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -306,6 +306,7 @@ typedef enum JoinType
*/
JOIN_SEMI, /* 1 copy of each LHS row that has match(es) */
JOIN_ANTI, /* 1 copy of each LHS row that has no match */
+ JOIN_RIGHT_SEMI, /* 1 copy of each RHS row that has match(es) */
JOIN_RIGHT_ANTI, /* 1 copy of each RHS row that has no match */
/*
@@ -322,10 +323,10 @@ typedef enum JoinType
/*
* OUTER joins are those for which pushed-down quals must behave differently
- * from the join's own quals. This is in fact everything except INNER and
- * SEMI joins. However, this macro must also exclude the JOIN_UNIQUE symbols
- * since those are temporary proxies for what will eventually be an INNER
- * join.
+ * from the join's own quals. This is in fact everything except INNER, SEMI
+ * and RIGHT_SEMI joins. However, this macro must also exclude the
+ * JOIN_UNIQUE symbols since those are temporary proxies for what will
+ * eventually be an INNER join.
*
* Note: semijoins are a hybrid case, but we choose to treat them as not
* being outer joins. This is okay principally because the SQL syntax makes
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 2ba297c117..14ccfc1ac1 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2823,9 +2823,9 @@ typedef struct PlaceHolderVar
* min_lefthand and min_righthand for higher joins.)
*
* jointype is never JOIN_RIGHT; a RIGHT JOIN is handled by switching
- * the inputs to make it a LEFT JOIN. It's never JOIN_RIGHT_ANTI either.
- * So the allowed values of jointype in a join_info_list member are only
- * LEFT, FULL, SEMI, or ANTI.
+ * the inputs to make it a LEFT JOIN. It's never JOIN_RIGHT_SEMI or
+ * JOIN_RIGHT_ANTI either. So the allowed values of jointype in a
+ * join_info_list member are only LEFT, FULL, SEMI, or ANTI.
*
* ojrelid is the RT index of the join RTE representing this outer join,
* if there is one. It is zero when jointype is INNER or SEMI, and can be
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 6b16c3a676..826cb9c4e3 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1905,22 +1905,22 @@ SELECT *
-- semijoin selectivity for <>
--
explain (costs off)
-select * from int4_tbl i4, tenk1 a
-where exists(select * from tenk1 b
- where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
- and i4.f1 = a.tenthous;
+select * from tenk1 a, tenk1 b
+where exists(select * from tenk1 c
+ where b.twothousand = c.twothousand and b.fivethous <> c.fivethous)
+ and a.tenthous = b.tenthous;
QUERY PLAN
----------------------------------------------
Hash Semi Join
- Hash Cond: (a.twothousand = b.twothousand)
- Join Filter: (a.fivethous <> b.fivethous)
+ Hash Cond: (b.twothousand = c.twothousand)
+ Join Filter: (b.fivethous <> c.fivethous)
-> Hash Join
- Hash Cond: (a.tenthous = i4.f1)
+ Hash Cond: (a.tenthous = b.tenthous)
-> Seq Scan on tenk1 a
-> Hash
- -> Seq Scan on int4_tbl i4
+ -> Seq Scan on tenk1 b
-> Hash
- -> Seq Scan on tenk1 b
+ -> Seq Scan on tenk1 c
(10 rows)
--
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 6d07f86b9b..53591a4f2d 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2543,24 +2543,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2752,24 +2752,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -3036,25 +3036,25 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_adv t1 INNER JOIN prt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: (t1.a = t2.b)
+ -> Hash Right Semi Join
+ Hash Cond: (t2.b = t1.a)
-> Append
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt2_adv_p1 t2_1
+ -> Seq Scan on prt2_adv_p2 t2_2
+ -> Seq Scan on prt2_adv_p3_1 t2_3
+ -> Seq Scan on prt2_adv_p3_2 t2_4
-> Hash
-> Append
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Seq Scan on prt2_adv_p3_1 t2_3
- -> Seq Scan on prt2_adv_p3_2 t2_4
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(17 rows)
-- left join
@@ -3433,27 +3433,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3623,27 +3626,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3839,25 +3845,25 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: ((t1.a = t2.a) AND (t1.c = t2.c))
+ -> Hash Right Semi Join
+ Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Seq Scan on plt2_adv_p1 t2_1
+ -> Seq Scan on plt2_adv_p2_1 t2_2
+ -> Seq Scan on plt2_adv_p2_2 t2_3
+ -> Seq Scan on plt2_adv_p3 t2_4
-> Hash
-> Append
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Seq Scan on plt2_adv_p2_1 t2_2
- -> Seq Scan on plt2_adv_p2_2 t2_3
- -> Seq Scan on plt2_adv_p3 t2_4
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
(17 rows)
-- left join
@@ -3987,28 +3993,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1_null t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
+ -> Seq Scan on plt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p1_null t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3_null t2_3
-(19 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 87273fa635..c96285d1bb 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -1076,26 +1076,26 @@ reset role;
explain (costs off, verbose)
select count(*) from tenk1 a where (unique1, two) in
(select unique1, row_number() over() from tenk1 b);
- QUERY PLAN
-----------------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------
Aggregate
Output: count(*)
- -> Hash Semi Join
- Hash Cond: ((a.unique1 = b.unique1) AND (a.two = (row_number() OVER (?))))
- -> Gather
+ -> Hash Right Semi Join
+ Hash Cond: ((b.unique1 = a.unique1) AND ((row_number() OVER (?)) = a.two))
+ -> WindowAgg
+ Output: b.unique1, row_number() OVER (?)
+ -> Gather
+ Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
+ Output: b.unique1
+ -> Hash
Output: a.unique1, a.two
- Workers Planned: 4
- -> Parallel Seq Scan on public.tenk1 a
+ -> Gather
Output: a.unique1, a.two
- -> Hash
- Output: b.unique1, (row_number() OVER (?))
- -> WindowAgg
- Output: b.unique1, row_number() OVER (?)
- -> Gather
- Output: b.unique1
- Workers Planned: 4
- -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
- Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Seq Scan on public.tenk1 a
+ Output: a.unique1, a.two
(18 rows)
-- LIMIT/OFFSET within sub-selects can't be pushed to workers.
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1d1f568bc4..9c21b76800 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -3287,10 +3287,10 @@ NOTICE: snooped value: 8
SELECT * FROM v1 WHERE b=8;
a | b | c | d
---+---+------+------
- 9 | 8 | t1 | t11d
- 9 | 8 | t11 | t11d
- 9 | 8 | t12 | t11d
9 | 8 | t111 | t11d
+ 9 | 8 | t12 | t11d
+ 9 | 8 | t11 | t11d
+ 9 | 8 | t1 | t11d
(4 rows)
DELETE FROM v1 WHERE snoop(a) AND leakproof(a); -- should not delete everything, just where a>5
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 8bfe3b7ba6..baa597fece 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -214,10 +214,10 @@ SELECT *
-- semijoin selectivity for <>
--
explain (costs off)
-select * from int4_tbl i4, tenk1 a
-where exists(select * from tenk1 b
- where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
- and i4.f1 = a.tenthous;
+select * from tenk1 a, tenk1 b
+where exists(select * from tenk1 c
+ where b.twothousand = c.twothousand and b.fivethous <> c.fivethous)
+ and a.tenthous = b.tenthous;
--
--
2.43.0
On Mon, 24 Jun 2024 at 17:59, Richard Guo <guofenglinux@gmail.com> wrote:
Thank you for reviewing.
On Mon, Jun 24, 2024 at 1:27 PM Li Japin <japinli@hotmail.com> wrote:
+ /* + * For now we do not support RIGHT_SEMI join in mergejoin or nestloop + * join. + */ + if (jointype == JOIN_RIGHT_SEMI) + return; +How about adding some reasons here?
I've included a brief explanation in select_mergejoin_clauses.
Thank you for updating the patch.
+ * this is a right-semi join, or this is a right/right-anti/full join and + * there are nonmergejoinable join clauses. The executor's mergejoinMaybe we can put the right-semi join together with the right/right-anti/full
join. Is there any other significance by putting it separately?I don't think so. The logic is different: for right-semi join we will
always set *mergejoin_allowed to false, while for right/right-anti/full
join it is set to false only if there are nonmergejoinable join clauses.
Got it. Thanks for the explanation.
Maybe the following comments also should be updated. Right?
Correct. And there are a few more places where we need to mention
JOIN_RIGHT_SEMI, such as in reduce_outer_joins_pass2 and in the comment
for SpecialJoinInfo.I noticed that this patch changes the plan of a query in join.sql from
a semi join to right semi join, compromising the original purpose of
this query, which was to test the fix for neqjoinsel's behavior for
semijoins (see commit 7ca25b7d).--
-- semijoin selectivity for <>
--
explain (costs off)
select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
and i4.f1 = a.tenthous;So I've changed this test case a bit so that it is still testing what it
is supposed to test.In passing, I've also updated the commit message to clarify that this
patch does not address the support of "Right Semi Join" for merge joins.
Tested and looks good to me!
--
Regrads,
Japin Li
Hi Japin Li
Thank you for your reviewing ,This way the notes are more accurate and
complete. Thanks also to the author for updating the patch ,I also tested
the new patch ,It looks good to me
Regrads
Japin Li <japinli@hotmail.com> 于2024年6月25日周二 08:51写道:
Show quoted text
On Mon, 24 Jun 2024 at 17:59, Richard Guo <guofenglinux@gmail.com> wrote:
Thank you for reviewing.
On Mon, Jun 24, 2024 at 1:27 PM Li Japin <japinli@hotmail.com> wrote:
+ /* + * For now we do not support RIGHT_SEMI join in mergejoin ornestloop
+ * join. + */ + if (jointype == JOIN_RIGHT_SEMI) + return; +How about adding some reasons here?
I've included a brief explanation in select_mergejoin_clauses.
Thank you for updating the patch.
+ * this is a right-semi join, or this is a right/right-anti/full join
and
+ * there are nonmergejoinable join clauses. The executor's mergejoin
Maybe we can put the right-semi join together with the
right/right-anti/full
join. Is there any other significance by putting it separately?
I don't think so. The logic is different: for right-semi join we will
always set *mergejoin_allowed to false, while for right/right-anti/full
join it is set to false only if there are nonmergejoinable join clauses.Got it. Thanks for the explanation.
Maybe the following comments also should be updated. Right?
Correct. And there are a few more places where we need to mention
JOIN_RIGHT_SEMI, such as in reduce_outer_joins_pass2 and in the comment
for SpecialJoinInfo.I noticed that this patch changes the plan of a query in join.sql from
a semi join to right semi join, compromising the original purpose of
this query, which was to test the fix for neqjoinsel's behavior for
semijoins (see commit 7ca25b7d).--
-- semijoin selectivity for <>
--
explain (costs off)
select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <>b.fivethous)
and i4.f1 = a.tenthous;
So I've changed this test case a bit so that it is still testing what it
is supposed to test.In passing, I've also updated the commit message to clarify that this
patch does not address the support of "Right Semi Join" for merge joins.Tested and looks good to me!
--
Regrads,
Japin Li
On Mon, Jun 24, 2024 at 5:59 PM Richard Guo <guofenglinux@gmail.com> wrote:
I noticed that this patch changes the plan of a query in join.sql from
a semi join to right semi join, compromising the original purpose of
this query, which was to test the fix for neqjoinsel's behavior for
semijoins (see commit 7ca25b7d).--
-- semijoin selectivity for <>
--
explain (costs off)
select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
and i4.f1 = a.tenthous;So I've changed this test case a bit so that it is still testing what it
is supposed to test.
I've refined this test case further to make it more stable by using an
additional filter 'a.tenthous < 5000'. Besides, I noticed a surplus
blank line in ExecHashJoinImpl(). I've removed it in the v7 patch.
Thanks
Richard
Attachments:
v7-0001-Support-Right-Semi-Join-plan-shapes.patchapplication/octet-stream; name=v7-0001-Support-Right-Semi-Join-plan-shapes.patchDownload
From 49963e35ecdb2c439e8d2477706b0fc5e7135d6f Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 6 Jun 2024 12:20:43 +0900
Subject: [PATCH v7] Support "Right Semi Join" plan shapes
Hash joins can support semijoin with the LHS input on the right, using
the existing logic for inner join, combined with the assurance that only
the first match for each inner tuple is considered, which can be
achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very
useful in some cases since we may now have the option to hash the
smaller table instead of the larger.
Merge join could likely support "Right Semi Join" too. However, the
benefit of swapping inputs tends to be small here, so we do not address
that in this patch.
---
.../postgres_fdw/expected/postgres_fdw.out | 15 +-
src/backend/commands/explain.c | 3 +
src/backend/executor/nodeHashjoin.c | 15 +-
src/backend/optimizer/path/joinpath.c | 42 +++-
src/backend/optimizer/path/joinrels.c | 3 +
src/backend/optimizer/path/pathkeys.c | 3 +
src/backend/optimizer/prep/prepjointree.c | 6 +-
src/include/nodes/nodes.h | 9 +-
src/include/nodes/pathnodes.h | 6 +-
src/test/regress/expected/join.out | 27 +--
src/test/regress/expected/partition_join.out | 216 +++++++++---------
src/test/regress/expected/select_parallel.out | 32 +--
src/test/regress/expected/updatable_views.out | 6 +-
src/test/regress/sql/join.sql | 8 +-
14 files changed, 220 insertions(+), 171 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index ea566d5034..1f22309194 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -4127,13 +4127,16 @@ RESET enable_sort;
-- subquery using immutable function (can be sent to remote)
PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st3(10, 20);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
- Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY r1."C 1" ASC NULLS LAST
-(4 rows)
+ Sort Key: t1.c1
+ -> Foreign Scan
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
+ Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3)))
+(7 rows)
EXECUTE st3(10, 20);
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 94511a5a02..30de9de9d4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1749,6 +1749,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
case JOIN_ANTI:
jointype = "Anti";
break;
+ case JOIN_RIGHT_SEMI:
+ jointype = "Right Semi";
+ break;
case JOIN_RIGHT_ANTI:
jointype = "Right Anti";
break;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index dbf114cd5e..c46764023d 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -533,6 +533,14 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
}
}
+ /*
+ * In a right-semijoin, we only need the first match for each
+ * inner tuple.
+ */
+ if (node->js.jointype == JOIN_RIGHT_SEMI &&
+ HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
+ continue;
+
/*
* We've got a match, but still need to test non-hashed quals.
* ExecScanHashBucket already set up all the state needed to
@@ -549,10 +557,10 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
{
node->hj_MatchedOuter = true;
-
/*
- * This is really only needed if HJ_FILL_INNER(node), but
- * we'll avoid the branch and just set it always.
+ * This is really only needed if HJ_FILL_INNER(node) or if
+ * we are in a right-semijoin, but we'll avoid the branch
+ * and just set it always.
*/
if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
@@ -779,6 +787,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
{
case JOIN_INNER:
case JOIN_SEMI:
+ case JOIN_RIGHT_SEMI:
break;
case JOIN_LEFT:
case JOIN_ANTI:
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5be8da9e09..40eb58341c 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -288,8 +288,8 @@ add_paths_to_joinrel(PlannerInfo *root,
* sorted. This includes both nestloops and mergejoins where the outer
* path is already ordered. Again, skip this if we can't mergejoin.
* (That's okay because we know that nestloop can't handle
- * right/right-anti/full joins at all, so it wouldn't work in the
- * prohibited cases either.)
+ * right/right-anti/right-semi/full joins at all, so it wouldn't work in
+ * the prohibited cases either.)
*/
if (mergejoin_allowed)
match_unsorted_outer(root, joinrel, outerrel, innerrel,
@@ -1728,6 +1728,13 @@ match_unsorted_outer(PlannerInfo *root,
Path *matpath = NULL;
ListCell *lc1;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
+ * join.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ return;
+
/*
* Nestloop only supports inner, left, semi, and anti joins. Also, if we
* are doing a right, right-anti or full mergejoin, we must use *all* the
@@ -2297,12 +2304,13 @@ hash_inner_and_outer(PlannerInfo *root,
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
* path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, or right-anti join, we can't use
- * parallelism (building the hash table in each backend) because
- * no one process has all the match bits.
+ * inner path. If full, right, right-semi or right-anti join, we
+ * can't use parallelism (building the hash table in each backend)
+ * because no one process has all the match bits.
*/
if (save_jointype == JOIN_FULL ||
save_jointype == JOIN_RIGHT ||
+ save_jointype == JOIN_RIGHT_SEMI ||
save_jointype == JOIN_RIGHT_ANTI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
@@ -2327,13 +2335,13 @@ hash_inner_and_outer(PlannerInfo *root,
* Returns a list of RestrictInfo nodes for those clauses.
*
* *mergejoin_allowed is normally set to true, but it is set to false if
- * this is a right/right-anti/full join and there are nonmergejoinable join
- * clauses. The executor's mergejoin machinery cannot handle such cases, so
- * we have to avoid generating a mergejoin plan. (Note that this flag does
- * NOT consider whether there are actually any mergejoinable clauses. This is
- * correct because in some cases we need to build a clauseless mergejoin.
- * Simply returning NIL is therefore not enough to distinguish safe from
- * unsafe cases.)
+ * this is a right-semi join, or this is a right/right-anti/full join and
+ * there are nonmergejoinable join clauses. The executor's mergejoin
+ * machinery cannot handle such cases, so we have to avoid generating a
+ * mergejoin plan. (Note that this flag does NOT consider whether there are
+ * actually any mergejoinable clauses. This is correct because in some
+ * cases we need to build a clauseless mergejoin. Simply returning NIL is
+ * therefore not enough to distinguish safe from unsafe cases.)
*
* We also mark each selected RestrictInfo to show which side is currently
* being considered as outer. These are transient markings that are only
@@ -2357,6 +2365,16 @@ select_mergejoin_clauses(PlannerInfo *root,
bool have_nonmergeable_joinclause = false;
ListCell *l;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin: the benefit of
+ * swapping inputs tends to be small here.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
foreach(l, restrictlist)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index db475e25b1..a3677f824f 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -991,6 +991,9 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
add_paths_to_joinrel(root, joinrel, rel1, rel2,
JOIN_SEMI, sjinfo,
restrictlist);
+ add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ JOIN_RIGHT_SEMI, sjinfo,
+ restrictlist);
}
/*
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 416fc4e240..e25798972f 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1294,6 +1294,9 @@ build_join_pathkeys(PlannerInfo *root,
JoinType jointype,
List *outer_pathkeys)
{
+ /* RIGHT_SEMI should not come here */
+ Assert(jointype != JOIN_RIGHT_SEMI);
+
if (jointype == JOIN_FULL ||
jointype == JOIN_RIGHT ||
jointype == JOIN_RIGHT_ANTI)
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 5482ab85a7..969e257f70 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -455,8 +455,8 @@ pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
* point of the available_rels machinations is to ensure that we only
* pull up quals for which that's okay.
*
- * We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI, or
- * JOIN_RIGHT_ANTI jointypes here.
+ * We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI,
+ * JOIN_RIGHT_SEMI, or JOIN_RIGHT_ANTI jointypes here.
*/
switch (j->jointype)
{
@@ -2950,7 +2950,7 @@ reduce_outer_joins_pass2(Node *jtnode,
* These could only have been introduced by pull_up_sublinks,
* so there's no way that upper quals could refer to their
* righthand sides, and no point in checking. We don't expect
- * to see JOIN_RIGHT_ANTI yet.
+ * to see JOIN_RIGHT_SEMI or JOIN_RIGHT_ANTI yet.
*/
break;
default:
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 855009fd6e..0d71d821f7 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -306,6 +306,7 @@ typedef enum JoinType
*/
JOIN_SEMI, /* 1 copy of each LHS row that has match(es) */
JOIN_ANTI, /* 1 copy of each LHS row that has no match */
+ JOIN_RIGHT_SEMI, /* 1 copy of each RHS row that has match(es) */
JOIN_RIGHT_ANTI, /* 1 copy of each RHS row that has no match */
/*
@@ -322,10 +323,10 @@ typedef enum JoinType
/*
* OUTER joins are those for which pushed-down quals must behave differently
- * from the join's own quals. This is in fact everything except INNER and
- * SEMI joins. However, this macro must also exclude the JOIN_UNIQUE symbols
- * since those are temporary proxies for what will eventually be an INNER
- * join.
+ * from the join's own quals. This is in fact everything except INNER, SEMI
+ * and RIGHT_SEMI joins. However, this macro must also exclude the
+ * JOIN_UNIQUE symbols since those are temporary proxies for what will
+ * eventually be an INNER join.
*
* Note: semijoins are a hybrid case, but we choose to treat them as not
* being outer joins. This is okay principally because the SQL syntax makes
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 2ba297c117..14ccfc1ac1 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2823,9 +2823,9 @@ typedef struct PlaceHolderVar
* min_lefthand and min_righthand for higher joins.)
*
* jointype is never JOIN_RIGHT; a RIGHT JOIN is handled by switching
- * the inputs to make it a LEFT JOIN. It's never JOIN_RIGHT_ANTI either.
- * So the allowed values of jointype in a join_info_list member are only
- * LEFT, FULL, SEMI, or ANTI.
+ * the inputs to make it a LEFT JOIN. It's never JOIN_RIGHT_SEMI or
+ * JOIN_RIGHT_ANTI either. So the allowed values of jointype in a
+ * join_info_list member are only LEFT, FULL, SEMI, or ANTI.
*
* ojrelid is the RT index of the join RTE representing this outer join,
* if there is one. It is zero when jointype is INNER or SEMI, and can be
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 6b16c3a676..9142dab171 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1905,23 +1905,24 @@ SELECT *
-- semijoin selectivity for <>
--
explain (costs off)
-select * from int4_tbl i4, tenk1 a
-where exists(select * from tenk1 b
- where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
- and i4.f1 = a.tenthous;
- QUERY PLAN
-----------------------------------------------
+select * from tenk1 a, tenk1 b
+where exists(select * from tenk1 c
+ where b.twothousand = c.twothousand and b.fivethous <> c.fivethous)
+ and a.tenthous = b.tenthous and a.tenthous < 5000;
+ QUERY PLAN
+-----------------------------------------------
Hash Semi Join
- Hash Cond: (a.twothousand = b.twothousand)
- Join Filter: (a.fivethous <> b.fivethous)
+ Hash Cond: (b.twothousand = c.twothousand)
+ Join Filter: (b.fivethous <> c.fivethous)
-> Hash Join
- Hash Cond: (a.tenthous = i4.f1)
- -> Seq Scan on tenk1 a
+ Hash Cond: (b.tenthous = a.tenthous)
+ -> Seq Scan on tenk1 b
-> Hash
- -> Seq Scan on int4_tbl i4
+ -> Seq Scan on tenk1 a
+ Filter: (tenthous < 5000)
-> Hash
- -> Seq Scan on tenk1 b
-(10 rows)
+ -> Seq Scan on tenk1 c
+(11 rows)
--
-- More complicated constructs
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 6d07f86b9b..53591a4f2d 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2543,24 +2543,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2752,24 +2752,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -3036,25 +3036,25 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_adv t1 INNER JOIN prt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: (t1.a = t2.b)
+ -> Hash Right Semi Join
+ Hash Cond: (t2.b = t1.a)
-> Append
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt2_adv_p1 t2_1
+ -> Seq Scan on prt2_adv_p2 t2_2
+ -> Seq Scan on prt2_adv_p3_1 t2_3
+ -> Seq Scan on prt2_adv_p3_2 t2_4
-> Hash
-> Append
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Seq Scan on prt2_adv_p3_1 t2_3
- -> Seq Scan on prt2_adv_p3_2 t2_4
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(17 rows)
-- left join
@@ -3433,27 +3433,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3623,27 +3626,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3839,25 +3845,25 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: ((t1.a = t2.a) AND (t1.c = t2.c))
+ -> Hash Right Semi Join
+ Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Seq Scan on plt2_adv_p1 t2_1
+ -> Seq Scan on plt2_adv_p2_1 t2_2
+ -> Seq Scan on plt2_adv_p2_2 t2_3
+ -> Seq Scan on plt2_adv_p3 t2_4
-> Hash
-> Append
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Seq Scan on plt2_adv_p2_1 t2_2
- -> Seq Scan on plt2_adv_p2_2 t2_3
- -> Seq Scan on plt2_adv_p3 t2_4
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
(17 rows)
-- left join
@@ -3987,28 +3993,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1_null t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
+ -> Seq Scan on plt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p1_null t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3_null t2_3
-(19 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 87273fa635..c96285d1bb 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -1076,26 +1076,26 @@ reset role;
explain (costs off, verbose)
select count(*) from tenk1 a where (unique1, two) in
(select unique1, row_number() over() from tenk1 b);
- QUERY PLAN
-----------------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------
Aggregate
Output: count(*)
- -> Hash Semi Join
- Hash Cond: ((a.unique1 = b.unique1) AND (a.two = (row_number() OVER (?))))
- -> Gather
+ -> Hash Right Semi Join
+ Hash Cond: ((b.unique1 = a.unique1) AND ((row_number() OVER (?)) = a.two))
+ -> WindowAgg
+ Output: b.unique1, row_number() OVER (?)
+ -> Gather
+ Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
+ Output: b.unique1
+ -> Hash
Output: a.unique1, a.two
- Workers Planned: 4
- -> Parallel Seq Scan on public.tenk1 a
+ -> Gather
Output: a.unique1, a.two
- -> Hash
- Output: b.unique1, (row_number() OVER (?))
- -> WindowAgg
- Output: b.unique1, row_number() OVER (?)
- -> Gather
- Output: b.unique1
- Workers Planned: 4
- -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
- Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Seq Scan on public.tenk1 a
+ Output: a.unique1, a.two
(18 rows)
-- LIMIT/OFFSET within sub-selects can't be pushed to workers.
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1d1f568bc4..9c21b76800 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -3287,10 +3287,10 @@ NOTICE: snooped value: 8
SELECT * FROM v1 WHERE b=8;
a | b | c | d
---+---+------+------
- 9 | 8 | t1 | t11d
- 9 | 8 | t11 | t11d
- 9 | 8 | t12 | t11d
9 | 8 | t111 | t11d
+ 9 | 8 | t12 | t11d
+ 9 | 8 | t11 | t11d
+ 9 | 8 | t1 | t11d
(4 rows)
DELETE FROM v1 WHERE snoop(a) AND leakproof(a); -- should not delete everything, just where a>5
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 8bfe3b7ba6..e3d2652083 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -214,10 +214,10 @@ SELECT *
-- semijoin selectivity for <>
--
explain (costs off)
-select * from int4_tbl i4, tenk1 a
-where exists(select * from tenk1 b
- where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
- and i4.f1 = a.tenthous;
+select * from tenk1 a, tenk1 b
+where exists(select * from tenk1 c
+ where b.twothousand = c.twothousand and b.fivethous <> c.fivethous)
+ and a.tenthous = b.tenthous and a.tenthous < 5000;
--
--
2.43.0
On Fri, Jun 28, 2024 at 2:54 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Mon, Jun 24, 2024 at 5:59 PM Richard Guo <guofenglinux@gmail.com> wrote:
I noticed that this patch changes the plan of a query in join.sql from
a semi join to right semi join, compromising the original purpose of
this query, which was to test the fix for neqjoinsel's behavior for
semijoins (see commit 7ca25b7d).--
-- semijoin selectivity for <>
--
explain (costs off)
select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
and i4.f1 = a.tenthous;So I've changed this test case a bit so that it is still testing what it
is supposed to test.I've refined this test case further to make it more stable by using an
additional filter 'a.tenthous < 5000'. Besides, I noticed a surplus
blank line in ExecHashJoinImpl(). I've removed it in the v7 patch.
BTW, I've also verified the empty-rel optimization for hash join and
AFAICT it works correctly for the new right-semi join.
Thanks
Richard
On Fri, Jun 28, 2024 at 3:21 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Fri, Jun 28, 2024 at 2:54 PM Richard Guo <guofenglinux@gmail.com> wrote:
I've refined this test case further to make it more stable by using an
additional filter 'a.tenthous < 5000'. Besides, I noticed a surplus
blank line in ExecHashJoinImpl(). I've removed it in the v7 patch.BTW, I've also verified the empty-rel optimization for hash join and
AFAICT it works correctly for the new right-semi join.
Here is a new rebase.
Barring objections, I'm planning to push it soon.
Thanks
Richard
Attachments:
v8-0001-Support-Right-Semi-Join-plan-shapes.patchapplication/octet-stream; name=v8-0001-Support-Right-Semi-Join-plan-shapes.patchDownload
From 0ca18687f50f6793d4d2786fd47565e3b71f52d7 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 4 Jul 2024 17:52:00 +0900
Subject: [PATCH v8] Support "Right Semi Join" plan shapes
Hash joins can support semijoin with the LHS input on the right, using
the existing logic for inner join, combined with the assurance that only
the first match for each inner tuple is considered, which can be
achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very
useful in some cases since we may now have the option to hash the
smaller table instead of the larger.
Merge join could likely support "Right Semi Join" too. However, the
benefit of swapping inputs tends to be small here, so we do not address
that in this patch.
Note that this patch also modifies a test query in join.sql to ensure it
continues testing as intended. With this patch the original query would
result in a right-semi-join rather than semi-join, compromising its
original purpose of testing the fix for neqjoinsel's behavior for
semi-joins.
Author: Richard Guo
Reviewed-by: wenhui qiu, Alena Rybakina, Japin Li
Discussion: https://postgr.es/m/CAMbWs4_X1mN=ic+SxcyymUqFx9bB8pqSLTGJ-F=MHy4PW3eRXw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 15 +-
src/backend/commands/explain.c | 3 +
src/backend/executor/nodeHashjoin.c | 15 +-
src/backend/optimizer/path/joinpath.c | 42 +++-
src/backend/optimizer/path/joinrels.c | 3 +
src/backend/optimizer/path/pathkeys.c | 3 +
src/backend/optimizer/prep/prepjointree.c | 6 +-
src/include/nodes/nodes.h | 9 +-
src/include/nodes/pathnodes.h | 6 +-
src/test/regress/expected/join.out | 27 +--
src/test/regress/expected/partition_join.out | 216 +++++++++---------
src/test/regress/expected/select_parallel.out | 32 +--
src/test/regress/expected/updatable_views.out | 6 +-
src/test/regress/sql/join.sql | 8 +-
14 files changed, 220 insertions(+), 171 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index ea566d5034..1f22309194 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -4127,13 +4127,16 @@ RESET enable_sort;
-- subquery using immutable function (can be sent to remote)
PREPARE st3(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c5) = '1970-01-17'::date) ORDER BY c1;
EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st3(10, 20);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Foreign Scan
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
- Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
- Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY r1."C 1" ASC NULLS LAST
-(4 rows)
+ Sort Key: t1.c1
+ -> Foreign Scan
+ Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+ Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
+ Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3)))
+(7 rows)
EXECUTE st3(10, 20);
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 94511a5a02..30de9de9d4 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1749,6 +1749,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
case JOIN_ANTI:
jointype = "Anti";
break;
+ case JOIN_RIGHT_SEMI:
+ jointype = "Right Semi";
+ break;
case JOIN_RIGHT_ANTI:
jointype = "Right Anti";
break;
diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c
index dbf114cd5e..c46764023d 100644
--- a/src/backend/executor/nodeHashjoin.c
+++ b/src/backend/executor/nodeHashjoin.c
@@ -533,6 +533,14 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
}
}
+ /*
+ * In a right-semijoin, we only need the first match for each
+ * inner tuple.
+ */
+ if (node->js.jointype == JOIN_RIGHT_SEMI &&
+ HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
+ continue;
+
/*
* We've got a match, but still need to test non-hashed quals.
* ExecScanHashBucket already set up all the state needed to
@@ -549,10 +557,10 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
{
node->hj_MatchedOuter = true;
-
/*
- * This is really only needed if HJ_FILL_INNER(node), but
- * we'll avoid the branch and just set it always.
+ * This is really only needed if HJ_FILL_INNER(node) or if
+ * we are in a right-semijoin, but we'll avoid the branch
+ * and just set it always.
*/
if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
@@ -779,6 +787,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflags)
{
case JOIN_INNER:
case JOIN_SEMI:
+ case JOIN_RIGHT_SEMI:
break;
case JOIN_LEFT:
case JOIN_ANTI:
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5be8da9e09..40eb58341c 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -288,8 +288,8 @@ add_paths_to_joinrel(PlannerInfo *root,
* sorted. This includes both nestloops and mergejoins where the outer
* path is already ordered. Again, skip this if we can't mergejoin.
* (That's okay because we know that nestloop can't handle
- * right/right-anti/full joins at all, so it wouldn't work in the
- * prohibited cases either.)
+ * right/right-anti/right-semi/full joins at all, so it wouldn't work in
+ * the prohibited cases either.)
*/
if (mergejoin_allowed)
match_unsorted_outer(root, joinrel, outerrel, innerrel,
@@ -1728,6 +1728,13 @@ match_unsorted_outer(PlannerInfo *root,
Path *matpath = NULL;
ListCell *lc1;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
+ * join.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ return;
+
/*
* Nestloop only supports inner, left, semi, and anti joins. Also, if we
* are doing a right, right-anti or full mergejoin, we must use *all* the
@@ -2297,12 +2304,13 @@ hash_inner_and_outer(PlannerInfo *root,
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
* path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, or right-anti join, we can't use
- * parallelism (building the hash table in each backend) because
- * no one process has all the match bits.
+ * inner path. If full, right, right-semi or right-anti join, we
+ * can't use parallelism (building the hash table in each backend)
+ * because no one process has all the match bits.
*/
if (save_jointype == JOIN_FULL ||
save_jointype == JOIN_RIGHT ||
+ save_jointype == JOIN_RIGHT_SEMI ||
save_jointype == JOIN_RIGHT_ANTI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
@@ -2327,13 +2335,13 @@ hash_inner_and_outer(PlannerInfo *root,
* Returns a list of RestrictInfo nodes for those clauses.
*
* *mergejoin_allowed is normally set to true, but it is set to false if
- * this is a right/right-anti/full join and there are nonmergejoinable join
- * clauses. The executor's mergejoin machinery cannot handle such cases, so
- * we have to avoid generating a mergejoin plan. (Note that this flag does
- * NOT consider whether there are actually any mergejoinable clauses. This is
- * correct because in some cases we need to build a clauseless mergejoin.
- * Simply returning NIL is therefore not enough to distinguish safe from
- * unsafe cases.)
+ * this is a right-semi join, or this is a right/right-anti/full join and
+ * there are nonmergejoinable join clauses. The executor's mergejoin
+ * machinery cannot handle such cases, so we have to avoid generating a
+ * mergejoin plan. (Note that this flag does NOT consider whether there are
+ * actually any mergejoinable clauses. This is correct because in some
+ * cases we need to build a clauseless mergejoin. Simply returning NIL is
+ * therefore not enough to distinguish safe from unsafe cases.)
*
* We also mark each selected RestrictInfo to show which side is currently
* being considered as outer. These are transient markings that are only
@@ -2357,6 +2365,16 @@ select_mergejoin_clauses(PlannerInfo *root,
bool have_nonmergeable_joinclause = false;
ListCell *l;
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin: the benefit of
+ * swapping inputs tends to be small here.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
foreach(l, restrictlist)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index db475e25b1..a3677f824f 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -991,6 +991,9 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
add_paths_to_joinrel(root, joinrel, rel1, rel2,
JOIN_SEMI, sjinfo,
restrictlist);
+ add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ JOIN_RIGHT_SEMI, sjinfo,
+ restrictlist);
}
/*
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 416fc4e240..e25798972f 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1294,6 +1294,9 @@ build_join_pathkeys(PlannerInfo *root,
JoinType jointype,
List *outer_pathkeys)
{
+ /* RIGHT_SEMI should not come here */
+ Assert(jointype != JOIN_RIGHT_SEMI);
+
if (jointype == JOIN_FULL ||
jointype == JOIN_RIGHT ||
jointype == JOIN_RIGHT_ANTI)
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 5482ab85a7..969e257f70 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -455,8 +455,8 @@ pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
* point of the available_rels machinations is to ensure that we only
* pull up quals for which that's okay.
*
- * We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI, or
- * JOIN_RIGHT_ANTI jointypes here.
+ * We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI,
+ * JOIN_RIGHT_SEMI, or JOIN_RIGHT_ANTI jointypes here.
*/
switch (j->jointype)
{
@@ -2950,7 +2950,7 @@ reduce_outer_joins_pass2(Node *jtnode,
* These could only have been introduced by pull_up_sublinks,
* so there's no way that upper quals could refer to their
* righthand sides, and no point in checking. We don't expect
- * to see JOIN_RIGHT_ANTI yet.
+ * to see JOIN_RIGHT_SEMI or JOIN_RIGHT_ANTI yet.
*/
break;
default:
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 855009fd6e..0d71d821f7 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -306,6 +306,7 @@ typedef enum JoinType
*/
JOIN_SEMI, /* 1 copy of each LHS row that has match(es) */
JOIN_ANTI, /* 1 copy of each LHS row that has no match */
+ JOIN_RIGHT_SEMI, /* 1 copy of each RHS row that has match(es) */
JOIN_RIGHT_ANTI, /* 1 copy of each RHS row that has no match */
/*
@@ -322,10 +323,10 @@ typedef enum JoinType
/*
* OUTER joins are those for which pushed-down quals must behave differently
- * from the join's own quals. This is in fact everything except INNER and
- * SEMI joins. However, this macro must also exclude the JOIN_UNIQUE symbols
- * since those are temporary proxies for what will eventually be an INNER
- * join.
+ * from the join's own quals. This is in fact everything except INNER, SEMI
+ * and RIGHT_SEMI joins. However, this macro must also exclude the
+ * JOIN_UNIQUE symbols since those are temporary proxies for what will
+ * eventually be an INNER join.
*
* Note: semijoins are a hybrid case, but we choose to treat them as not
* being outer joins. This is okay principally because the SQL syntax makes
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 2ba297c117..14ccfc1ac1 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2823,9 +2823,9 @@ typedef struct PlaceHolderVar
* min_lefthand and min_righthand for higher joins.)
*
* jointype is never JOIN_RIGHT; a RIGHT JOIN is handled by switching
- * the inputs to make it a LEFT JOIN. It's never JOIN_RIGHT_ANTI either.
- * So the allowed values of jointype in a join_info_list member are only
- * LEFT, FULL, SEMI, or ANTI.
+ * the inputs to make it a LEFT JOIN. It's never JOIN_RIGHT_SEMI or
+ * JOIN_RIGHT_ANTI either. So the allowed values of jointype in a
+ * join_info_list member are only LEFT, FULL, SEMI, or ANTI.
*
* ojrelid is the RT index of the join RTE representing this outer join,
* if there is one. It is zero when jointype is INNER or SEMI, and can be
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 6b16c3a676..9142dab171 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1905,23 +1905,24 @@ SELECT *
-- semijoin selectivity for <>
--
explain (costs off)
-select * from int4_tbl i4, tenk1 a
-where exists(select * from tenk1 b
- where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
- and i4.f1 = a.tenthous;
- QUERY PLAN
-----------------------------------------------
+select * from tenk1 a, tenk1 b
+where exists(select * from tenk1 c
+ where b.twothousand = c.twothousand and b.fivethous <> c.fivethous)
+ and a.tenthous = b.tenthous and a.tenthous < 5000;
+ QUERY PLAN
+-----------------------------------------------
Hash Semi Join
- Hash Cond: (a.twothousand = b.twothousand)
- Join Filter: (a.fivethous <> b.fivethous)
+ Hash Cond: (b.twothousand = c.twothousand)
+ Join Filter: (b.fivethous <> c.fivethous)
-> Hash Join
- Hash Cond: (a.tenthous = i4.f1)
- -> Seq Scan on tenk1 a
+ Hash Cond: (b.tenthous = a.tenthous)
+ -> Seq Scan on tenk1 b
-> Hash
- -> Seq Scan on int4_tbl i4
+ -> Seq Scan on tenk1 a
+ Filter: (tenthous < 5000)
-> Hash
- -> Seq Scan on tenk1 b
-(10 rows)
+ -> Seq Scan on tenk1 c
+(11 rows)
--
-- More complicated constructs
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 6d07f86b9b..53591a4f2d 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -2543,24 +2543,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -2752,24 +2752,24 @@ SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: (t1_1.a = t2_1.b)
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_1.b = t1_1.a)
+ -> Seq Scan on prt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Hash Semi Join
- Hash Cond: (t1_2.a = t2_2.b)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_2.b = t1_2.a)
+ -> Seq Scan on prt2_adv_p2 t2_2
-> Hash
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Hash Semi Join
- Hash Cond: (t1_3.a = t2_3.b)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Hash Right Semi Join
+ Hash Cond: (t2_3.b = t1_3.a)
+ -> Seq Scan on prt2_adv_p3 t2_3
-> Hash
- -> Seq Scan on prt2_adv_p3 t2_3
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(21 rows)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
@@ -3036,25 +3036,25 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_adv t1 INNER JOIN prt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: (t1.a = t2.b)
+ -> Hash Right Semi Join
+ Hash Cond: (t2.b = t1.a)
-> Append
- -> Seq Scan on prt1_adv_p1 t1_1
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p2 t1_2
- Filter: (b = 0)
- -> Seq Scan on prt1_adv_p3 t1_3
- Filter: (b = 0)
+ -> Seq Scan on prt2_adv_p1 t2_1
+ -> Seq Scan on prt2_adv_p2 t2_2
+ -> Seq Scan on prt2_adv_p3_1 t2_3
+ -> Seq Scan on prt2_adv_p3_2 t2_4
-> Hash
-> Append
- -> Seq Scan on prt2_adv_p1 t2_1
- -> Seq Scan on prt2_adv_p2 t2_2
- -> Seq Scan on prt2_adv_p3_1 t2_3
- -> Seq Scan on prt2_adv_p3_2 t2_4
+ -> Seq Scan on prt1_adv_p1 t1_1
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p2 t1_2
+ Filter: (b = 0)
+ -> Seq Scan on prt1_adv_p3 t1_3
+ Filter: (b = 0)
(17 rows)
-- left join
@@ -3433,27 +3433,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3623,27 +3626,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Nested Loop Semi Join
- Join Filter: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
-> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3 t2_3
-(18 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
@@ -3839,25 +3845,25 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
---------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------
Sort
Sort Key: t1.a
- -> Hash Semi Join
- Hash Cond: ((t1.a = t2.a) AND (t1.c = t2.c))
+ -> Hash Right Semi Join
+ Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
- -> Seq Scan on plt1_adv_p1 t1_1
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Seq Scan on plt2_adv_p1 t2_1
+ -> Seq Scan on plt2_adv_p2_1 t2_2
+ -> Seq Scan on plt2_adv_p2_2 t2_3
+ -> Seq Scan on plt2_adv_p3 t2_4
-> Hash
-> Append
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Seq Scan on plt2_adv_p2_1 t2_2
- -> Seq Scan on plt2_adv_p2_2 t2_3
- -> Seq Scan on plt2_adv_p3 t2_4
+ -> Seq Scan on plt1_adv_p1 t1_1
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
(17 rows)
-- left join
@@ -3987,28 +3993,30 @@ SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a =
-- semi join
EXPLAIN (COSTS OFF)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------
Sort
Sort Key: t1.a
-> Append
- -> Hash Semi Join
- Hash Cond: ((t1_1.a = t2_1.a) AND (t1_1.c = t2_1.c))
- -> Seq Scan on plt1_adv_p1_null t1_1
- Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_1.a = t1_1.a) AND (t2_1.c = t1_1.c))
+ -> Seq Scan on plt2_adv_p1 t2_1
-> Hash
- -> Seq Scan on plt2_adv_p1 t2_1
- -> Nested Loop Semi Join
- Join Filter: ((t1_2.a = t2_2.a) AND (t1_2.c = t2_2.c))
- -> Seq Scan on plt1_adv_p2 t1_2
- Filter: (b < 10)
+ -> Seq Scan on plt1_adv_p1_null t1_1
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_2.a = t1_2.a) AND (t2_2.c = t1_2.c))
-> Seq Scan on plt2_adv_p2 t2_2
- -> Nested Loop Semi Join
- Join Filter: ((t1_3.a = t2_3.a) AND (t1_3.c = t2_3.c))
- -> Seq Scan on plt1_adv_p3 t1_3
- Filter: (b < 10)
+ -> Hash
+ -> Seq Scan on plt1_adv_p2 t1_2
+ Filter: (b < 10)
+ -> Hash Right Semi Join
+ Hash Cond: ((t2_3.a = t1_3.a) AND (t2_3.c = t1_3.c))
-> Seq Scan on plt2_adv_p3_null t2_3
-(19 rows)
+ -> Hash
+ -> Seq Scan on plt1_adv_p3 t1_3
+ Filter: (b < 10)
+(21 rows)
SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 87273fa635..c96285d1bb 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -1076,26 +1076,26 @@ reset role;
explain (costs off, verbose)
select count(*) from tenk1 a where (unique1, two) in
(select unique1, row_number() over() from tenk1 b);
- QUERY PLAN
-----------------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------
Aggregate
Output: count(*)
- -> Hash Semi Join
- Hash Cond: ((a.unique1 = b.unique1) AND (a.two = (row_number() OVER (?))))
- -> Gather
+ -> Hash Right Semi Join
+ Hash Cond: ((b.unique1 = a.unique1) AND ((row_number() OVER (?)) = a.two))
+ -> WindowAgg
+ Output: b.unique1, row_number() OVER (?)
+ -> Gather
+ Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
+ Output: b.unique1
+ -> Hash
Output: a.unique1, a.two
- Workers Planned: 4
- -> Parallel Seq Scan on public.tenk1 a
+ -> Gather
Output: a.unique1, a.two
- -> Hash
- Output: b.unique1, (row_number() OVER (?))
- -> WindowAgg
- Output: b.unique1, row_number() OVER (?)
- -> Gather
- Output: b.unique1
- Workers Planned: 4
- -> Parallel Index Only Scan using tenk1_unique1 on public.tenk1 b
- Output: b.unique1
+ Workers Planned: 4
+ -> Parallel Seq Scan on public.tenk1 a
+ Output: a.unique1, a.two
(18 rows)
-- LIMIT/OFFSET within sub-selects can't be pushed to workers.
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1d1f568bc4..9c21b76800 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -3287,10 +3287,10 @@ NOTICE: snooped value: 8
SELECT * FROM v1 WHERE b=8;
a | b | c | d
---+---+------+------
- 9 | 8 | t1 | t11d
- 9 | 8 | t11 | t11d
- 9 | 8 | t12 | t11d
9 | 8 | t111 | t11d
+ 9 | 8 | t12 | t11d
+ 9 | 8 | t11 | t11d
+ 9 | 8 | t1 | t11d
(4 rows)
DELETE FROM v1 WHERE snoop(a) AND leakproof(a); -- should not delete everything, just where a>5
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 8bfe3b7ba6..e3d2652083 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -214,10 +214,10 @@ SELECT *
-- semijoin selectivity for <>
--
explain (costs off)
-select * from int4_tbl i4, tenk1 a
-where exists(select * from tenk1 b
- where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
- and i4.f1 = a.tenthous;
+select * from tenk1 a, tenk1 b
+where exists(select * from tenk1 c
+ where b.twothousand = c.twothousand and b.fivethous <> c.fivethous)
+ and a.tenthous = b.tenthous and a.tenthous < 5000;
--
--
2.43.0
Hi Richard Guo
Thank you for updating the patch.Tested on v8 , It looks good to me
Thanks
Richard Guo <guofenglinux@gmail.com> 于2024年7月4日周四 17:18写道:
Show quoted text
On Fri, Jun 28, 2024 at 3:21 PM Richard Guo <guofenglinux@gmail.com>
wrote:On Fri, Jun 28, 2024 at 2:54 PM Richard Guo <guofenglinux@gmail.com>
wrote:
I've refined this test case further to make it more stable by using an
additional filter 'a.tenthous < 5000'. Besides, I noticed a surplus
blank line in ExecHashJoinImpl(). I've removed it in the v7 patch.BTW, I've also verified the empty-rel optimization for hash join and
AFAICT it works correctly for the new right-semi join.Here is a new rebase.
Barring objections, I'm planning to push it soon.
Thanks
Richard
On Thu, 04 Jul 2024 at 17:17, Richard Guo <guofenglinux@gmail.com> wrote:
On Fri, Jun 28, 2024 at 3:21 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Fri, Jun 28, 2024 at 2:54 PM Richard Guo <guofenglinux@gmail.com> wrote:
I've refined this test case further to make it more stable by using an
additional filter 'a.tenthous < 5000'. Besides, I noticed a surplus
blank line in ExecHashJoinImpl(). I've removed it in the v7 patch.BTW, I've also verified the empty-rel optimization for hash join and
AFAICT it works correctly for the new right-semi join.Here is a new rebase.
Barring objections, I'm planning to push it soon.
Thanks for updating the patch. It looks good to me, except for a minor nitpick:
s/right-semijoin/right-semi join/
--
Regrads,
Japin Li
On Thu, Jul 4, 2024 at 11:18 PM Japin Li <japinli@hotmail.com> wrote:
On Thu, 04 Jul 2024 at 17:17, Richard Guo <guofenglinux@gmail.com> wrote:
Here is a new rebase.
Barring objections, I'm planning to push it soon.
Pushed. Thanks for all the reviews.
Thanks for updating the patch. It looks good to me, except for a minor nitpick:
s/right-semijoin/right-semi join/
I did not take this one. The comment nearby for RIGHT_ANTI uses
'right-antijoin', and I think we'd better adopt a consistent pattern for
RIGHT_SEMI.
Thanks
Richard