Pathify RHS unique-ification for semijoin planning
I came across a query where the plan includes some unnecessary Sort
nodes. Here's an example that shows the issue.
create table t(a int, b int);
insert into t select i%100, i from generate_series(1,10000)i;
create index on t(a);
vacuum analyze t;
set enable_hashagg to off;
explain (costs off)
select * from t t1 where t1.a in
(select a from t t2 where a < 10)
order by t1.a;
QUERY PLAN
---------------------------------------------------------------
Merge Join
Merge Cond: (t1.a = t2.a)
-> Index Scan using t_a_idx on t t1
-> Sort
Sort Key: t2.a
-> Unique
-> Sort
Sort Key: t2.a
-> Index Only Scan using t_a_idx on t t2
Index Cond: (a < 10)
(10 rows)
I believe the two Sort nodes are unnecessary.
After some digging, it seems that this is related to one of the
approaches we use to implement semijoins: unique-ifying the RHS and
then doing a regular join. The unique-ification is handled in
create_unique_path(), which considers both hash-based and sort-based
implementations. However, in the case of sort-based implementation,
this function pays no attention to the subpath's pathkeys or the
pathkeys of the resulting output.
In addition to this specific issue, it seems to me that there are
other potential issues in create_unique_path().
* Currently, we only consider the cheapest_total_path of the RHS when
unique-ifying it. I think this may cause us to miss some optimization
opportunities. For example, there might be a path with a better sort
order that isn't the cheapest-total one. Such a path could help avoid
a sort at a higher level, potentially resulting in a cheaper overall
plan.
* In create_unique_path(), we currently rely on heuristics to decide
whether to use a hash-based or sort-based method. I think a better
approach would be to create paths for both methods and let add_path()
determine which one is better, similar to how we handle path selection
in other parts of the planner.
Therefore, I'm thinking that maybe we could create a new RelOptInfo
for the RHS rel to represent its unique-ified version, and then
generate all worthwhile paths for it, similar to how it's done in
create_distinct_paths(). Since this is likely to be called repeatedly
on the same rel, we can cache the new RelOptInfo in the rel struct,
just like how we cache cheapest_unique_path currently.
To be concrete, I'm envisioning something like the following:
if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ (rel2_unique = create_unique_rel(root, rel2, sjinfo)) != NULL)
...
- add_paths_to_joinrel(root, joinrel, rel1, rel2,
- JOIN_UNIQUE_INNER, sjinfo,
+ add_paths_to_joinrel(root, joinrel, rel1, rel2_unique,
+ JOIN_INNER, sjinfo,
restrictlist);
- add_paths_to_joinrel(root, joinrel, rel2, rel1,
- JOIN_UNIQUE_OUTER, sjinfo,
+ add_paths_to_joinrel(root, joinrel, rel2_unique, rel1,
+ JOIN_INNER, sjinfo,
restrictlist);
In addition, by changing the join from "rel1" and "rel2" using
JOIN_UNIQUE_OUTER or JOIN_UNIQUE_INNER to a join between "rel1" and
"rel2_unique" using a standard JOIN_INNER, we might be able to get
rid of the JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER jointypes. This
could simplify a lot of logic in joinpath.c, where we're increasingly
adding special-case handling for these two jointypes.
One last point, I doubt that the code related to UNIQUE_PATH_NOOP is
reachable in practice. create_unique_path() checks whether the input
rel is an RTE_RELATION or RTE_SUBQUERY and is provably unique, and
creates a UNIQUE_PATH_NOOP UniquePath if so. However, in that case,
the semijoin should have already been simplified to a plain inner join
by analyzejoins.c.
Any thoughts?
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
Hi,
However, in the case of sort-based implementation,
this function pays no attention to the subpath's pathkeys or the
pathkeys of the resulting output.
Good finding!
In addition to this specific issue, it seems to me that there are
other potential issues in create_unique_path().* Currently, we only consider the cheapest_total_path of the RHS when
unique-ifying it.
I think it is better have a check the tuple_fraction for the startup_cost
factor, for some paths where the total cost is high but the required
fraction is lower.
I think this may cause us to miss some optimization
opportunities. For example, there might be a path with a better sort
order that isn't the cheapest-total one. Such a path could help avoid
a sort at a higher level, potentially resulting in a cheaper overall
plan.
* In create_unique_path(), we currently rely on heuristics to decide
whether to use a hash-based or sort-based method. I think a better
approach would be to create paths for both methods and let add_path()
determine which one is better, similar to how we handle path selection
in other parts of the planner.Therefore, I'm thinking that maybe we could create a new RelOptInfo
for the RHS rel to represent its unique-ified version, and then
generate all worthwhile paths for it,
This sounds great for me. and I think we can keep the fraction
cheapest path on the new RelOptInfo as well, then all the things should
be on the way.
To be concrete, I'm envisioning something like the following:
if (bms_equal(sjinfo->syn_righthand, rel2->relids) && - create_unique_path(root, rel2, rel2->cheapest_total_path, - sjinfo) != NULL) + (rel2_unique = create_unique_rel(root, rel2, sjinfo)) != NULL)...
- add_paths_to_joinrel(root, joinrel, rel1, rel2, - JOIN_UNIQUE_INNER, sjinfo, + add_paths_to_joinrel(root, joinrel, rel1, rel2_unique, + JOIN_INNER, sjinfo, restrictlist); - add_paths_to_joinrel(root, joinrel, rel2, rel1, - JOIN_UNIQUE_OUTER, sjinfo, + add_paths_to_joinrel(root, joinrel, rel2_unique, rel1, + JOIN_INNER, sjinfo, restrictlist);In addition, by changing the join from "rel1" and "rel2" using
JOIN_UNIQUE_OUTER or JOIN_UNIQUE_INNER to a join between "rel1" and
"rel2_unique" using a standard JOIN_INNER, we might be able to get
rid of the JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER jointypes.
if we can, +10.
--
Best Regards
Andy Fan
On Thu, 22 May 2025 at 17:28, Andy Fan <zhihuifan1213@163.com> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
Hi,
However, in the case of sort-based implementation,
this function pays no attention to the subpath's pathkeys or the
pathkeys of the resulting output.Good finding!
In addition to this specific issue, it seems to me that there are
other potential issues in create_unique_path().* Currently, we only consider the cheapest_total_path of the RHS when
unique-ifying it.I think it is better have a check the tuple_fraction for the startup_cost
factor, for some paths where the total cost is high but the required
fraction is lower.I think this may cause us to miss some optimization
opportunities. For example, there might be a path with a better sort
order that isn't the cheapest-total one. Such a path could help avoid
a sort at a higher level, potentially resulting in a cheaper overall
plan.* In create_unique_path(), we currently rely on heuristics to decide
whether to use a hash-based or sort-based method. I think a better
approach would be to create paths for both methods and let add_path()
determine which one is better, similar to how we handle path selection
in other parts of the planner.Therefore, I'm thinking that maybe we could create a new RelOptInfo
for the RHS rel to represent its unique-ified version, and then
generate all worthwhile paths for it,This sounds great for me. and I think we can keep the fraction
cheapest path on the new RelOptInfo as well, then all the things should
be on the way.To be concrete, I'm envisioning something like the following:
if (bms_equal(sjinfo->syn_righthand, rel2->relids) && - create_unique_path(root, rel2, rel2->cheapest_total_path, - sjinfo) != NULL) + (rel2_unique = create_unique_rel(root, rel2, sjinfo)) !=NULL)
...
- add_paths_to_joinrel(root, joinrel, rel1, rel2, - JOIN_UNIQUE_INNER, sjinfo, + add_paths_to_joinrel(root, joinrel, rel1, rel2_unique, + JOIN_INNER, sjinfo, restrictlist); - add_paths_to_joinrel(root, joinrel, rel2, rel1, - JOIN_UNIQUE_OUTER, sjinfo, + add_paths_to_joinrel(root, joinrel, rel2_unique, rel1, + JOIN_INNER, sjinfo, restrictlist);In addition, by changing the join from "rel1" and "rel2" using
JOIN_UNIQUE_OUTER or JOIN_UNIQUE_INNER to a join between "rel1" and
"rel2_unique" using a standard JOIN_INNER, we might be able to get
rid of the JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER jointypes.if we can, +10.
Agree
Pls kindly release a path for this?
Show quoted text
On Thu, May 22, 2025 at 4:05 PM Richard Guo <guofenglinux@gmail.com> wrote:
Therefore, I'm thinking that maybe we could create a new RelOptInfo
for the RHS rel to represent its unique-ified version, and then
generate all worthwhile paths for it, similar to how it's done in
create_distinct_paths(). Since this is likely to be called repeatedly
on the same rel, we can cache the new RelOptInfo in the rel struct,
just like how we cache cheapest_unique_path currently.To be concrete, I'm envisioning something like the following:
if (bms_equal(sjinfo->syn_righthand, rel2->relids) && - create_unique_path(root, rel2, rel2->cheapest_total_path, - sjinfo) != NULL) + (rel2_unique = create_unique_rel(root, rel2, sjinfo)) != NULL)...
- add_paths_to_joinrel(root, joinrel, rel1, rel2, - JOIN_UNIQUE_INNER, sjinfo, + add_paths_to_joinrel(root, joinrel, rel1, rel2_unique, + JOIN_INNER, sjinfo, restrictlist); - add_paths_to_joinrel(root, joinrel, rel2, rel1, - JOIN_UNIQUE_OUTER, sjinfo, + add_paths_to_joinrel(root, joinrel, rel2_unique, rel1, + JOIN_INNER, sjinfo, restrictlist);
Here is a WIP draft patch based on this idea. It retains
JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to help determine whether the
inner relation is provably unique, but otherwise removes most of the
code related to these two join types.
Additionally, the T_Unique path now has the same meaning for both
semijoins and DISTINCT clauses: it represents adjacent-duplicate
removal on presorted input. This patch unifies their handling by
sharing the same data structures and functions.
There are a few plan diffs in the regression tests. As far as I can
tell, the changes are improvements. One of them is caused by the fact
that we now consider parameterized paths in unique-ified cases. The
rest are mostly a result of now preserving pathkeys for unique paths.
This patch is still a work in progress. Before investing too much
time into it, I'd like to get some feedback on whether it's heading in
the right direction.
Thanks
Richard
Attachments:
v1-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchapplication/octet-stream; name=v1-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchDownload
From c525d28b3cb0a1bcc3fc5205f5b5e1ebbdb69a2a Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 21 May 2025 12:32:29 +0900
Subject: [PATCH v1] Pathify RHS unique-ification for semijoin planning
---
src/backend/optimizer/README | 3 +-
src/backend/optimizer/path/joinpath.c | 281 +++--------
src/backend/optimizer/path/joinrels.c | 18 +-
src/backend/optimizer/plan/createplan.c | 292 +-----------
src/backend/optimizer/plan/planner.c | 469 ++++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 30 +-
src/backend/optimizer/util/pathnode.c | 306 +-----------
src/backend/optimizer/util/relnode.c | 10 +-
src/include/nodes/nodes.h | 4 +-
src/include/nodes/pathnodes.h | 45 +-
src/include/optimizer/pathnode.h | 12 +-
src/include/optimizer/planner.h | 3 +
src/test/regress/expected/join.out | 17 +-
src/test/regress/expected/partition_join.out | 94 ++--
src/test/regress/expected/subselect.out | 51 +-
src/tools/pgindent/typedefs.list | 2 -
16 files changed, 698 insertions(+), 939 deletions(-)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..843368096fd 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -640,7 +640,6 @@ RelOptInfo - a relation or joined relations
GroupResultPath - childless Result plan node (used for degenerate grouping)
MaterialPath - a Material plan node
MemoizePath - a Memoize plan node for caching tuples from sub-paths
- UniquePath - remove duplicate rows (either by hashing or sorting)
GatherPath - collect the results of parallel workers
GatherMergePath - collect parallel results, preserving their common sort order
ProjectionPath - a Result plan node with child (used for projection)
@@ -648,7 +647,7 @@ RelOptInfo - a relation or joined relations
SortPath - a Sort plan node applied to some sub-path
IncrementalSortPath - an IncrementalSort plan node applied to some sub-path
GroupPath - a Group plan node applied to some sub-path
- UpperUniquePath - a Unique plan node applied to some sub-path
+ UniquePath - a Unique plan node applied to some sub-path
AggPath - an Agg plan node applied to some sub-path
GroupingSetsPath - an Agg plan node used to implement GROUPING SETS
MinMaxAggPath - a Result plan node with subplans performing MIN/MAX
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 26f0336f1e4..7a91641cbd9 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -113,12 +113,12 @@ static void generate_mergejoin_paths(PlannerInfo *root,
* direction from what's indicated in sjinfo.
*
* Also, this routine and others in this module accept the special JoinTypes
- * JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that we should
- * unique-ify the outer or inner relation and then apply a regular inner
- * join. These values are not allowed to propagate outside this module,
- * however. Path cost estimation code may need to recognize that it's
- * dealing with such a case --- the combination of nominal jointype INNER
- * with sjinfo->jointype == JOIN_SEMI indicates that.
+ * JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that the outer or inner
+ * relation has been unique-ified and a regular inner join should then be
+ * applied. These values are not allowed to propagate outside this module,
+ * however. Path cost estimation code may need to recognize that it's dealing
+ * with such a case --- the combination of nominal jointype INNER with
+ * sjinfo->jointype == JOIN_SEMI indicates that.
*/
void
add_paths_to_joinrel(PlannerInfo *root,
@@ -161,10 +161,10 @@ add_paths_to_joinrel(PlannerInfo *root,
* (else reduce_unique_semijoins would've simplified it), so there's no
* point in calling innerrel_is_unique. However, if the LHS covers all of
* the semijoin's min_lefthand, then it's appropriate to set inner_unique
- * because the path produced by create_unique_path will be unique relative
- * to the LHS. (If we have an LHS that's only part of the min_lefthand,
- * that is *not* true.) For JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid
- * letting that value escape this module.
+ * because the unique relation produced by create_unique_paths will be
+ * unique relative to the LHS. (If we have an LHS that's only part of the
+ * min_lefthand, that is *not* true.) For JOIN_UNIQUE_OUTER, pass
+ * JOIN_INNER to avoid letting that value escape this module.
*/
switch (jointype)
{
@@ -1376,6 +1376,13 @@ sort_inner_and_outer(PlannerInfo *root,
if (extra->mergeclause_list == NIL)
return;
+ /*
+ * If the outer or inner relation has been unique-ified, handle as a plain
+ * inner join.
+ */
+ if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
+ jointype = JOIN_INNER;
+
/*
* We only consider the cheapest-total-cost input paths, since we are
* assuming here that a sort is required. We will consider
@@ -1402,25 +1409,6 @@ sort_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(inner_path, outerrel))
return;
- /*
- * If unique-ification is requested, do it and then handle as a plain
- * inner join.
- */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- outer_path = (Path *) create_unique_path(root, outerrel,
- outer_path, extra->sjinfo);
- Assert(outer_path);
- jointype = JOIN_INNER;
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- inner_path = (Path *) create_unique_path(root, innerrel,
- inner_path, extra->sjinfo);
- Assert(inner_path);
- jointype = JOIN_INNER;
- }
-
/*
* If the joinrel is parallel-safe, we may be able to consider a partial
* merge join. However, we can't handle JOIN_UNIQUE_OUTER, because the
@@ -1441,7 +1429,7 @@ sort_inner_and_outer(PlannerInfo *root,
if (inner_path->parallel_safe)
cheapest_safe_inner = inner_path;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
@@ -1580,12 +1568,10 @@ generate_mergejoin_paths(PlannerInfo *root,
List *trialsortkeys;
Path *cheapest_startup_inner;
Path *cheapest_total_inner;
- JoinType save_jointype = jointype;
int num_sortkeys;
int sortkeycnt;
- if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
+ Assert(jointype != JOIN_UNIQUE_OUTER && jointype != JOIN_UNIQUE_INNER);
/* Look for useful mergeclauses (if any) */
mergeclauses =
@@ -1636,10 +1622,6 @@ generate_mergejoin_paths(PlannerInfo *root,
extra,
is_partial);
- /* Can't do anything else if inner path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
/*
* Look for presorted inner paths that satisfy the innersortkey list ---
* or any truncation thereof, if we are allowed to build a mergejoin using
@@ -1877,20 +1859,7 @@ match_unsorted_outer(PlannerInfo *root,
if (PATH_PARAM_BY_REL(inner_cheapest_total, outerrel))
inner_cheapest_total = NULL;
- /*
- * If we need to unique-ify the inner path, we will consider only the
- * cheapest-total inner.
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /* No way to do this with an inner path parameterized by outer rel */
- if (inner_cheapest_total == NULL)
- return;
- inner_cheapest_total = (Path *)
- create_unique_path(root, innerrel, inner_cheapest_total, extra->sjinfo);
- Assert(inner_cheapest_total);
- }
- else if (nestjoinOK)
+ if (nestjoinOK)
{
/*
* Consider materializing the cheapest inner path, unless
@@ -1914,20 +1883,6 @@ match_unsorted_outer(PlannerInfo *root,
if (PATH_PARAM_BY_REL(outerpath, innerrel))
continue;
- /*
- * If we need to unique-ify the outer path, it's pointless to consider
- * any but the cheapest outer. (XXX we don't consider parameterized
- * outers, nor inners, for unique-ified cases. Should we?)
- */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- {
- if (outerpath != outerrel->cheapest_total_path)
- continue;
- outerpath = (Path *) create_unique_path(root, outerrel,
- outerpath, extra->sjinfo);
- Assert(outerpath);
- }
-
/*
* The result will have this sort order (even if it is implemented as
* a nestloop, and even if some of the mergeclauses are implemented by
@@ -1936,21 +1891,7 @@ match_unsorted_outer(PlannerInfo *root,
merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
outerpath->pathkeys);
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /*
- * Consider nestloop join, but only with the unique-ified cheapest
- * inner path
- */
- try_nestloop_path(root,
- joinrel,
- outerpath,
- inner_cheapest_total,
- merge_pathkeys,
- jointype,
- extra);
- }
- else if (nestjoinOK)
+ if (nestjoinOK)
{
/*
* Consider nestloop joins using this outer path and various
@@ -2001,17 +1942,13 @@ match_unsorted_outer(PlannerInfo *root,
extra);
}
- /* Can't do anything else if outer path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- continue;
-
/* Can't do anything else if inner rel is parameterized by outer */
if (inner_cheapest_total == NULL)
continue;
/* Generate merge join paths */
generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
- save_jointype, extra, useallclauses,
+ jointype, extra, useallclauses,
inner_cheapest_total, merge_pathkeys,
false);
}
@@ -2035,25 +1972,22 @@ match_unsorted_outer(PlannerInfo *root,
{
if (nestjoinOK)
consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
- save_jointype, extra);
+ jointype, extra);
/*
* If inner_cheapest_total is NULL or non parallel-safe then find the
- * cheapest total parallel safe path. If doing JOIN_UNIQUE_INNER, we
- * can't use any alternative inner path.
+ * cheapest total parallel safe path.
*/
if (inner_cheapest_total == NULL ||
!inner_cheapest_total->parallel_safe)
{
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
- inner_cheapest_total = get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
+ inner_cheapest_total =
+ get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
if (inner_cheapest_total)
consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
- save_jointype, extra,
+ jointype, extra,
inner_cheapest_total);
}
}
@@ -2118,24 +2052,17 @@ consider_parallel_nestloop(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
Path *matpath = NULL;
ListCell *lc1;
- if (jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/*
- * Consider materializing the cheapest inner path, unless: 1) we're doing
- * JOIN_UNIQUE_INNER, because in this case we have to unique-ify the
- * cheapest inner path, 2) enable_material is off, 3) the cheapest inner
- * path is not parallel-safe, 4) the cheapest inner path is parameterized
- * by the outer rel, or 5) the cheapest inner path materializes its output
- * anyway.
+ * Consider materializing the cheapest inner path, unless: 1)
+ * enable_material is off, 2) the cheapest inner path is not
+ * parallel-safe, 3) the cheapest inner path is parameterized by the outer
+ * rel, or 4) the cheapest inner path materializes its output anyway.
*/
- if (save_jointype != JOIN_UNIQUE_INNER &&
- enable_material && inner_cheapest_total->parallel_safe &&
+ if (enable_material && inner_cheapest_total->parallel_safe &&
!PATH_PARAM_BY_REL(inner_cheapest_total, outerrel) &&
!ExecMaterializesOutput(inner_cheapest_total->pathtype))
{
@@ -2169,23 +2096,6 @@ consider_parallel_nestloop(PlannerInfo *root,
if (!innerpath->parallel_safe)
continue;
- /*
- * If we're doing JOIN_UNIQUE_INNER, we can only use the inner's
- * cheapest_total_path, and we have to unique-ify it. (We might
- * be able to relax this to allow other safe, unparameterized
- * inner paths, but right now create_unique_path is not on board
- * with that.)
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- if (innerpath != innerrel->cheapest_total_path)
- continue;
- innerpath = (Path *) create_unique_path(root, innerrel,
- innerpath,
- extra->sjinfo);
- Assert(innerpath);
- }
-
try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
pathkeys, jointype, extra);
@@ -2232,6 +2142,13 @@ hash_inner_and_outer(PlannerInfo *root,
List *hashclauses;
ListCell *l;
+ /*
+ * If the outer or inner relation has been unique-ified, handle as a plain
+ * inner join.
+ */
+ if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
+ jointype = JOIN_INNER;
+
/*
* We need to build only one hashclauses list for any given pair of outer
* and inner relations; all of the hashable clauses will be used as keys.
@@ -2290,6 +2207,8 @@ hash_inner_and_outer(PlannerInfo *root,
Path *cheapest_startup_outer = outerrel->cheapest_startup_path;
Path *cheapest_total_outer = outerrel->cheapest_total_path;
Path *cheapest_total_inner = innerrel->cheapest_total_path;
+ ListCell *lc1;
+ ListCell *lc2;
/*
* If either cheapest-total path is parameterized by the other rel, we
@@ -2301,102 +2220,55 @@ hash_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
return;
- /* Unique-ify if need be; we ignore parameterized possibilities */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- cheapest_total_outer = (Path *)
- create_unique_path(root, outerrel,
- cheapest_total_outer, extra->sjinfo);
- Assert(cheapest_total_outer);
- jointype = JOIN_INNER;
- try_hashjoin_path(root,
- joinrel,
- cheapest_total_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- /* no possibility of cheap startup here */
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- cheapest_total_inner = (Path *)
- create_unique_path(root, innerrel,
- cheapest_total_inner, extra->sjinfo);
- Assert(cheapest_total_inner);
- jointype = JOIN_INNER;
+ /*
+ * Consider the cheapest startup outer together with the cheapest
+ * total inner, and then consider pairings of cheapest-total paths
+ * including parameterized ones. There is no use in generating
+ * parameterized paths on the basis of possibly cheap startup cost, so
+ * this is sufficient.
+ */
+ if (cheapest_startup_outer != NULL)
try_hashjoin_path(root,
joinrel,
- cheapest_total_outer,
+ cheapest_startup_outer,
cheapest_total_inner,
hashclauses,
jointype,
extra);
- if (cheapest_startup_outer != NULL &&
- cheapest_startup_outer != cheapest_total_outer)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- }
- else
+
+ foreach(lc1, outerrel->cheapest_parameterized_paths)
{
+ Path *outerpath = (Path *) lfirst(lc1);
+
/*
- * For other jointypes, we consider the cheapest startup outer
- * together with the cheapest total inner, and then consider
- * pairings of cheapest-total paths including parameterized ones.
- * There is no use in generating parameterized paths on the basis
- * of possibly cheap startup cost, so this is sufficient.
+ * We cannot use an outer path that is parameterized by the inner
+ * rel.
*/
- ListCell *lc1;
- ListCell *lc2;
-
- if (cheapest_startup_outer != NULL)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
+ if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ continue;
- foreach(lc1, outerrel->cheapest_parameterized_paths)
+ foreach(lc2, innerrel->cheapest_parameterized_paths)
{
- Path *outerpath = (Path *) lfirst(lc1);
+ Path *innerpath = (Path *) lfirst(lc2);
/*
- * We cannot use an outer path that is parameterized by the
- * inner rel.
+ * We cannot use an inner path that is parameterized by the
+ * outer rel, either.
*/
- if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ if (PATH_PARAM_BY_REL(innerpath, outerrel))
continue;
- foreach(lc2, innerrel->cheapest_parameterized_paths)
- {
- Path *innerpath = (Path *) lfirst(lc2);
-
- /*
- * We cannot use an inner path that is parameterized by
- * the outer rel, either.
- */
- if (PATH_PARAM_BY_REL(innerpath, outerrel))
- continue;
-
- if (outerpath == cheapest_startup_outer &&
- innerpath == cheapest_total_inner)
- continue; /* already tried it */
+ if (outerpath == cheapest_startup_outer &&
+ innerpath == cheapest_total_inner)
+ continue; /* already tried it */
- try_hashjoin_path(root,
- joinrel,
- outerpath,
- innerpath,
- hashclauses,
- jointype,
- extra);
- }
+ try_hashjoin_path(root,
+ joinrel,
+ outerpath,
+ innerpath,
+ hashclauses,
+ jointype,
+ extra);
}
}
@@ -2441,9 +2313,8 @@ hash_inner_and_outer(PlannerInfo *root,
* Normally, given that the joinrel is parallel-safe, the cheapest
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
- * path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, right-semi or right-anti join, we
- * can't use parallelism (building the hash table in each backend)
+ * path. If full, right, right-semi or right-anti join, we can't
+ * use parallelism (building the hash table in each backend)
* because no one process has all the match bits.
*/
if (save_jointype == JOIN_FULL ||
@@ -2453,7 +2324,7 @@ hash_inner_and_outer(PlannerInfo *root,
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 60d65762b5d..fec359c28f6 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -19,6 +19,7 @@
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
+#include "optimizer/planner.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
@@ -444,8 +445,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel2, sjinfo) != NULL)
{
/*----------
* For a semijoin, we can join the RHS to anything else by
@@ -477,8 +477,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel1->relids) &&
- create_unique_path(root, rel1, rel1->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel1, sjinfo) != NULL)
{
/* Reversed semijoin case */
if (match_sjinfo)
@@ -895,6 +894,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist)
{
+ RelOptInfo *unique_rel2;
+
/*
* Consider paths using each rel as both outer and inner. Depending on
* the join type, a provably empty outer or inner rel might mean the join
@@ -1000,14 +1001,13 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
/*
* If we know how to unique-ify the RHS and one input rel is
* exactly the RHS (not a superset) we can consider unique-ifying
- * it and then doing a regular join. (The create_unique_path
+ * it and then doing a regular join. (The create_unique_paths
* check here is probably redundant with what join_is_legal did,
* but if so the check is cheap because it's cached. So test
* anyway to be sure.)
*/
if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ (unique_rel2 = create_unique_paths(root, rel2, sjinfo)) != NULL)
{
if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
restriction_is_constant_false(restrictlist, joinrel, false))
@@ -1015,10 +1015,10 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
mark_dummy_rel(joinrel);
break;
}
- add_paths_to_joinrel(root, joinrel, rel1, rel2,
+ add_paths_to_joinrel(root, joinrel, rel1, unique_rel2,
JOIN_UNIQUE_INNER, sjinfo,
restrictlist);
- add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ add_paths_to_joinrel(root, joinrel, unique_rel2, rel1,
JOIN_UNIQUE_OUTER, sjinfo,
restrictlist);
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 4ad30b7627e..1658b20f17e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -95,8 +95,6 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
int flags);
static Memoize *create_memoize_plan(PlannerInfo *root, MemoizePath *best_path,
int flags);
-static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
- int flags);
static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
static Plan *create_projection_plan(PlannerInfo *root,
ProjectionPath *best_path,
@@ -106,8 +104,7 @@ static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
static IncrementalSort *create_incrementalsort_plan(PlannerInfo *root,
IncrementalSortPath *best_path, int flags);
static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
-static Unique *create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path,
- int flags);
+static Unique *create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags);
static Agg *create_agg_plan(PlannerInfo *root, AggPath *best_path);
static Plan *create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path);
static Result *create_minmaxagg_plan(PlannerInfo *root, MinMaxAggPath *best_path);
@@ -293,9 +290,9 @@ static WindowAgg *make_windowagg(List *tlist, WindowClause *wc,
static Group *make_group(List *tlist, List *qual, int numGroupCols,
AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations,
Plan *lefttree);
-static Unique *make_unique_from_sortclauses(Plan *lefttree, List *distinctList);
static Unique *make_unique_from_pathkeys(Plan *lefttree,
- List *pathkeys, int numCols);
+ List *pathkeys, int numCols,
+ Relids relids);
static Gather *make_gather(List *qptlist, List *qpqual,
int nworkers, int rescan_param, bool single_copy, Plan *subplan);
static SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy,
@@ -467,19 +464,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
flags);
break;
case T_Unique:
- if (IsA(best_path, UpperUniquePath))
- {
- plan = (Plan *) create_upper_unique_plan(root,
- (UpperUniquePath *) best_path,
- flags);
- }
- else
- {
- Assert(IsA(best_path, UniquePath));
- plan = create_unique_plan(root,
- (UniquePath *) best_path,
- flags);
- }
+ plan = (Plan *) create_unique_plan(root,
+ (UniquePath *) best_path,
+ flags);
break;
case T_Gather:
plan = (Plan *) create_gather_plan(root,
@@ -1710,207 +1697,6 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags)
return plan;
}
-/*
- * create_unique_plan
- * Create a Unique plan for 'best_path' and (recursively) plans
- * for its subpaths.
- *
- * Returns a Plan node.
- */
-static Plan *
-create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
-{
- Plan *plan;
- Plan *subplan;
- List *in_operators;
- List *uniq_exprs;
- List *newtlist;
- int nextresno;
- bool newitems;
- int numGroupCols;
- AttrNumber *groupColIdx;
- Oid *groupCollations;
- int groupColPos;
- ListCell *l;
-
- /* Unique doesn't project, so tlist requirements pass through */
- subplan = create_plan_recurse(root, best_path->subpath, flags);
-
- /* Done if we don't need to do any actual unique-ifying */
- if (best_path->umethod == UNIQUE_PATH_NOOP)
- return subplan;
-
- /*
- * As constructed, the subplan has a "flat" tlist containing just the Vars
- * needed here and at upper levels. The values we are supposed to
- * unique-ify may be expressions in these variables. We have to add any
- * such expressions to the subplan's tlist.
- *
- * The subplan may have a "physical" tlist if it is a simple scan plan. If
- * we're going to sort, this should be reduced to the regular tlist, so
- * that we don't sort more data than we need to. For hashing, the tlist
- * should be left as-is if we don't need to add any expressions; but if we
- * do have to add expressions, then a projection step will be needed at
- * runtime anyway, so we may as well remove unneeded items. Therefore
- * newtlist starts from build_path_tlist() not just a copy of the
- * subplan's tlist; and we don't install it into the subplan unless we are
- * sorting or stuff has to be added.
- */
- in_operators = best_path->in_operators;
- uniq_exprs = best_path->uniq_exprs;
-
- /* initialize modified subplan tlist as just the "required" vars */
- newtlist = build_path_tlist(root, &best_path->path);
- nextresno = list_length(newtlist) + 1;
- newitems = false;
-
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle)
- {
- tle = makeTargetEntry((Expr *) uniqexpr,
- nextresno,
- NULL,
- false);
- newtlist = lappend(newtlist, tle);
- nextresno++;
- newitems = true;
- }
- }
-
- /* Use change_plan_targetlist in case we need to insert a Result node */
- if (newitems || best_path->umethod == UNIQUE_PATH_SORT)
- subplan = change_plan_targetlist(subplan, newtlist,
- best_path->path.parallel_safe);
-
- /*
- * Build control information showing which subplan output columns are to
- * be examined by the grouping step. Unfortunately we can't merge this
- * with the previous loop, since we didn't then know which version of the
- * subplan tlist we'd end up using.
- */
- newtlist = subplan->targetlist;
- numGroupCols = list_length(uniq_exprs);
- groupColIdx = (AttrNumber *) palloc(numGroupCols * sizeof(AttrNumber));
- groupCollations = (Oid *) palloc(numGroupCols * sizeof(Oid));
-
- groupColPos = 0;
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle) /* shouldn't happen */
- elog(ERROR, "failed to find unique expression in subplan tlist");
- groupColIdx[groupColPos] = tle->resno;
- groupCollations[groupColPos] = exprCollation((Node *) tle->expr);
- groupColPos++;
- }
-
- if (best_path->umethod == UNIQUE_PATH_HASH)
- {
- Oid *groupOperators;
-
- /*
- * Get the hashable equality operators for the Agg node to use.
- * Normally these are the same as the IN clause operators, but if
- * those are cross-type operators then the equality operators are the
- * ones for the IN clause operators' RHS datatype.
- */
- groupOperators = (Oid *) palloc(numGroupCols * sizeof(Oid));
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid eq_oper;
-
- if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
- elog(ERROR, "could not find compatible hash operator for operator %u",
- in_oper);
- groupOperators[groupColPos++] = eq_oper;
- }
-
- /*
- * Since the Agg node is going to project anyway, we can give it the
- * minimum output tlist, without any stuff we might have added to the
- * subplan tlist.
- */
- plan = (Plan *) make_agg(build_path_tlist(root, &best_path->path),
- NIL,
- AGG_HASHED,
- AGGSPLIT_SIMPLE,
- numGroupCols,
- groupColIdx,
- groupOperators,
- groupCollations,
- NIL,
- NIL,
- best_path->path.rows,
- 0,
- subplan);
- }
- else
- {
- List *sortList = NIL;
- Sort *sort;
-
- /* Create an ORDER BY list to sort the input compatibly */
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid sortop;
- Oid eqop;
- TargetEntry *tle;
- SortGroupClause *sortcl;
-
- sortop = get_ordering_op_for_equality_op(in_oper, false);
- if (!OidIsValid(sortop)) /* shouldn't happen */
- elog(ERROR, "could not find ordering operator for equality operator %u",
- in_oper);
-
- /*
- * The Unique node will need equality operators. Normally these
- * are the same as the IN clause operators, but if those are
- * cross-type operators then the equality operators are the ones
- * for the IN clause operators' RHS datatype.
- */
- eqop = get_equality_op_for_ordering_op(sortop, NULL);
- if (!OidIsValid(eqop)) /* shouldn't happen */
- elog(ERROR, "could not find equality operator for ordering operator %u",
- sortop);
-
- tle = get_tle_by_resno(subplan->targetlist,
- groupColIdx[groupColPos]);
- Assert(tle != NULL);
-
- sortcl = makeNode(SortGroupClause);
- sortcl->tleSortGroupRef = assignSortGroupRef(tle,
- subplan->targetlist);
- sortcl->eqop = eqop;
- sortcl->sortop = sortop;
- sortcl->reverse_sort = false;
- sortcl->nulls_first = false;
- sortcl->hashable = false; /* no need to make this accurate */
- sortList = lappend(sortList, sortcl);
- groupColPos++;
- }
- sort = make_sort_from_sortclauses(sortList, subplan);
- label_sort_with_costsize(root, sort, -1.0);
- plan = (Plan *) make_unique_from_sortclauses((Plan *) sort, sortList);
- }
-
- /* Copy cost data from Path to Plan */
- copy_generic_path_info(plan, &best_path->path);
-
- return plan;
-}
-
/*
* create_gather_plan
*
@@ -2268,13 +2054,13 @@ create_group_plan(PlannerInfo *root, GroupPath *best_path)
}
/*
- * create_upper_unique_plan
+ * create_unique_plan
*
* Create a Unique plan for 'best_path' and (recursively) plans
* for its subpaths.
*/
static Unique *
-create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flags)
+create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
{
Unique *plan;
Plan *subplan;
@@ -2288,7 +2074,8 @@ create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flag
plan = make_unique_from_pathkeys(subplan,
best_path->path.pathkeys,
- best_path->numkeys);
+ best_path->numkeys,
+ best_path->path.parent->relids);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -6761,61 +6548,12 @@ make_group(List *tlist,
}
/*
- * distinctList is a list of SortGroupClauses, identifying the targetlist items
- * that should be considered by the Unique filter. The input path must
- * already be sorted accordingly.
- */
-static Unique *
-make_unique_from_sortclauses(Plan *lefttree, List *distinctList)
-{
- Unique *node = makeNode(Unique);
- Plan *plan = &node->plan;
- int numCols = list_length(distinctList);
- int keyno = 0;
- AttrNumber *uniqColIdx;
- Oid *uniqOperators;
- Oid *uniqCollations;
- ListCell *slitem;
-
- plan->targetlist = lefttree->targetlist;
- plan->qual = NIL;
- plan->lefttree = lefttree;
- plan->righttree = NULL;
-
- /*
- * convert SortGroupClause list into arrays of attr indexes and equality
- * operators, as wanted by executor
- */
- Assert(numCols > 0);
- uniqColIdx = (AttrNumber *) palloc(sizeof(AttrNumber) * numCols);
- uniqOperators = (Oid *) palloc(sizeof(Oid) * numCols);
- uniqCollations = (Oid *) palloc(sizeof(Oid) * numCols);
-
- foreach(slitem, distinctList)
- {
- SortGroupClause *sortcl = (SortGroupClause *) lfirst(slitem);
- TargetEntry *tle = get_sortgroupclause_tle(sortcl, plan->targetlist);
-
- uniqColIdx[keyno] = tle->resno;
- uniqOperators[keyno] = sortcl->eqop;
- uniqCollations[keyno] = exprCollation((Node *) tle->expr);
- Assert(OidIsValid(uniqOperators[keyno]));
- keyno++;
- }
-
- node->numCols = numCols;
- node->uniqColIdx = uniqColIdx;
- node->uniqOperators = uniqOperators;
- node->uniqCollations = uniqCollations;
-
- return node;
-}
-
-/*
- * as above, but use pathkeys to identify the sort columns and semantics
+ * pathkeys is a list of PathKeys, identifying the sort columns and semantics.
+ * The input path must already be sorted accordingly.
*/
static Unique *
-make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
+make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols,
+ Relids relids)
{
Unique *node = makeNode(Unique);
Plan *plan = &node->plan;
@@ -6878,7 +6616,7 @@ make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
foreach(j, plan->targetlist)
{
tle = (TargetEntry *) lfirst(j);
- em = find_ec_member_matching_expr(ec, tle->expr, NULL);
+ em = find_ec_member_matching_expr(ec, tle->expr, relids);
if (em)
{
/* found expr already in tlist */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ff65867eebe..f41be9ca2a0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -267,6 +267,12 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
+static void create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
+static void create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
/*****************************************************************************
@@ -4917,10 +4923,10 @@ create_partial_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_partial_path(partial_distinct_rel, (Path *)
- create_upper_unique_path(root, partial_distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, partial_distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -5111,10 +5117,10 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_path(distinct_rel, (Path *)
- create_upper_unique_path(root, distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -8248,3 +8254,450 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
return grouplist;
}
+
+/*
+ * create_unique_paths
+ * Build a new RelOptInfo containing Paths that represent elimination of
+ * distinct rows from the input data. Distinct-ness is defined according to
+ * the needs of the semijoin represented by sjinfo. If it is not possible
+ * to identify how to make the data unique, NULL is returned.
+ *
+ * If used at all, this is likely to be called repeatedly on the same rel;
+ * So we cache the result.
+ */
+RelOptInfo *
+create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *unique_rel;
+ List *newtlist;
+ int nextresno;
+ List *sortList = NIL;
+ List *sortPathkeys = NIL;
+ List *groupClause = NIL;
+ MemoryContext oldcontext;
+ ListCell *lc1;
+ ListCell *lc2;
+
+ /* Caller made a mistake if SpecialJoinInfo is the wrong one */
+ Assert(sjinfo->jointype == JOIN_SEMI);
+ Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
+
+ /* If result already cached, return it */
+ if (rel->unique_rel)
+ return (RelOptInfo *) rel->unique_rel;
+
+ /* If it's not possible to unique-ify, return NULL */
+ if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
+ return NULL;
+
+ /*
+ * When called during GEQO join planning, we are in a short-lived memory
+ * context. We must make sure that the unique rel and any subsidiary data
+ * structures created for a baserel survive the GEQO cycle, else the
+ * baserel is trashed for future GEQO cycles. On the other hand, when we
+ * are creating those for a joinrel during GEQO, we don't want them to
+ * clutter the main planning context. Upshot is that the best solution is
+ * to explicitly allocate memory in the same context the given RelOptInfo
+ * is in.
+ */
+ oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
+
+ unique_rel = makeNode(RelOptInfo);
+ memcpy(unique_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ unique_rel->pathlist = NIL;
+ unique_rel->ppilist = NIL;
+ unique_rel->partial_pathlist = NIL;
+ unique_rel->cheapest_startup_path = NULL;
+ unique_rel->cheapest_total_path = NULL;
+ unique_rel->cheapest_parameterized_paths = NIL;
+
+ /* Estimate number of output rows */
+ unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ rel->rows,
+ NULL,
+ NULL);
+
+ /*
+ * The values we are supposed to unique-ify may be expressions in the
+ * variables of the input rel's targetlist. We have to add any such
+ * expressions to the unique rel's tlist.
+ *
+ * While in the loop, build the lists of SortGroupClause's representing
+ * the ordering for the sort-based implementation and the grouping for
+ * hash-based implementation.
+ */
+ newtlist = make_tlist_from_pathtarget(rel->reltarget);
+ nextresno = list_length(newtlist) + 1;
+
+ forboth(lc1, sjinfo->semi_rhs_exprs, lc2, sjinfo->semi_operators)
+ {
+ Expr *uniqexpr = lfirst(lc1);
+ Oid in_oper = lfirst_oid(lc2);
+ Oid sortop = InvalidOid;
+ TargetEntry *tle;
+
+ tle = tlist_member(uniqexpr, newtlist);
+ if (!tle)
+ {
+ tle = makeTargetEntry((Expr *) uniqexpr,
+ nextresno,
+ NULL,
+ false);
+ newtlist = lappend(newtlist, tle);
+ nextresno++;
+ }
+
+ if (sjinfo->semi_can_btree)
+ {
+ /* Create an ORDER BY list to sort the input compatibly */
+ Oid eqop;
+ SortGroupClause *sortcl;
+
+ sortop = get_ordering_op_for_equality_op(in_oper, false);
+ if (!OidIsValid(sortop)) /* shouldn't happen */
+ elog(ERROR, "could not find ordering operator for equality operator %u",
+ in_oper);
+
+ /*
+ * The Unique node will need equality operators. Normally these
+ * are the same as the IN clause operators, but if those are
+ * cross-type operators then the equality operators are the ones
+ * for the IN clause operators' RHS datatype.
+ */
+ eqop = get_equality_op_for_ordering_op(sortop, NULL);
+ if (!OidIsValid(eqop)) /* shouldn't happen */
+ elog(ERROR, "could not find equality operator for ordering operator %u",
+ sortop);
+
+ sortcl = makeNode(SortGroupClause);
+ sortcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ sortcl->eqop = eqop;
+ sortcl->sortop = sortop;
+ sortcl->reverse_sort = false;
+ sortcl->nulls_first = false;
+ sortcl->hashable = false; /* no need to make this accurate */
+ sortList = lappend(sortList, sortcl);
+ }
+ if (sjinfo->semi_can_hash)
+ {
+ /* Create a GROUP BY list for the Agg node to use */
+ Oid eq_oper;
+ SortGroupClause *groupcl;
+
+ /*
+ * Get the hashable equality operators for the Agg node to use.
+ * Normally these are the same as the IN clause operators, but if
+ * those are cross-type operators then the equality operators are
+ * the ones for the IN clause operators' RHS datatype.
+ */
+ if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
+ elog(ERROR, "could not find compatible hash operator for operator %u",
+ in_oper);
+
+ groupcl = makeNode(SortGroupClause);
+ groupcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ groupcl->eqop = eq_oper;
+ groupcl->sortop = sortop;
+ groupcl->reverse_sort = false;
+ groupcl->nulls_first = false;
+ groupcl->hashable = true;
+ groupClause = lappend(groupClause, groupcl);
+ }
+ }
+
+ unique_rel->reltarget = create_pathtarget(root, newtlist);
+ if (!IS_OTHER_REL(rel))
+ sortPathkeys = make_pathkeys_for_sortclauses(root, sortList, newtlist);
+ else
+ sortPathkeys = rel->top_parent->unique_pathkeys;
+
+ /* build unique paths based on input rel's pathlist */
+ create_final_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* build unique paths based on input rel's partial_pathlist */
+ create_partial_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* Now choose the best path(s) */
+ set_cheapest(unique_rel);
+
+ /*
+ * There should be no partial paths for the unique relation; otherwise, we
+ * won't be able to properly guarantee uniqueness.
+ */
+ Assert(unique_rel->partial_pathlist == NIL);
+
+ /* Cache the result */
+ rel->unique_rel = unique_rel;
+ rel->unique_pathkeys = sortPathkeys;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return unique_rel;
+}
+
+/*
+ * create_final_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' pathlist
+ */
+static void
+create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ */
+ foreach(lc, input_rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ input_path,
+ unique_rel->reltarget);
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != input_rel->cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, unique_rel, path,
+ list_length(sortPathkeys),
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ input_rel->cheapest_total_path,
+ unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ unique_rel,
+ path,
+ unique_rel->reltarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+
+ }
+}
+
+/*
+ * create_partial_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' partial_pathlist
+ */
+static void
+create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ RelOptInfo *partial_unique_rel;
+ Path *cheapest_partial_path;
+
+ /* nothing to do when there are no partial paths in the input rel */
+ if (!input_rel->consider_parallel || input_rel->partial_pathlist == NIL)
+ return;
+
+ cheapest_partial_path = linitial(input_rel->partial_pathlist);
+
+ partial_unique_rel = makeNode(RelOptInfo);
+ memcpy(partial_unique_rel, input_rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ partial_unique_rel->pathlist = NIL;
+ partial_unique_rel->ppilist = NIL;
+ partial_unique_rel->partial_pathlist = NIL;
+ partial_unique_rel->cheapest_startup_path = NULL;
+ partial_unique_rel->cheapest_total_path = NULL;
+ partial_unique_rel->cheapest_parameterized_paths = NIL;
+
+ /* Estimate number of output rows */
+ partial_unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ cheapest_partial_path->rows,
+ NULL,
+ NULL);
+ partial_unique_rel->reltarget = unique_rel->reltarget;
+
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest partial path and incremental sort on any paths
+ * with presorted keys.
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ input_path,
+ partial_unique_rel->reltarget);
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, partial_unique_rel, path,
+ list_length(sortPathkeys),
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ cheapest_partial_path,
+ partial_unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ partial_unique_rel,
+ path,
+ partial_unique_rel->reltarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+
+ if (partial_unique_rel->partial_pathlist != NIL)
+ {
+ generate_useful_gather_paths(root, partial_unique_rel, true);
+ set_cheapest(partial_unique_rel);
+
+ /*
+ * Finally, create paths to unique-ify the final result. This step is
+ * needed to remove any duplicates due to combining rows from parallel
+ * workers.
+ */
+ create_final_unique_paths(root, partial_unique_rel,
+ sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+ }
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index eab44da65b8..28a4ae64440 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -929,11 +929,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
@@ -946,11 +946,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
}
}
@@ -970,11 +970,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
NULL);
/* and make the MergeAppend unique */
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(tlist),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(tlist),
+ dNumGroups);
add_path(result_rel, path);
}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e0192d4a491..2ee06dc7317 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,7 +46,6 @@ typedef enum
*/
#define STD_FUZZ_FACTOR 1.01
-static List *translate_sub_tlist(List *tlist, int relid);
static int append_total_cost_compare(const ListCell *a, const ListCell *b);
static int append_startup_cost_compare(const ListCell *a, const ListCell *b);
static List *reparameterize_pathlist_by_child(PlannerInfo *root,
@@ -381,7 +380,6 @@ set_cheapest(RelOptInfo *parent_rel)
parent_rel->cheapest_startup_path = cheapest_startup_path;
parent_rel->cheapest_total_path = cheapest_total_path;
- parent_rel->cheapest_unique_path = NULL; /* computed only if needed */
parent_rel->cheapest_parameterized_paths = parameterized_paths;
}
@@ -1712,246 +1710,6 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * create_unique_path
- * Creates a path representing elimination of distinct rows from the
- * input data. Distinct-ness is defined according to the needs of the
- * semijoin represented by sjinfo. If it is not possible to identify
- * how to make the data unique, NULL is returned.
- *
- * If used at all, this is likely to be called repeatedly on the same rel;
- * and the input subpath should always be the same (the cheapest_total path
- * for the rel). So we cache the result.
- */
-UniquePath *
-create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- SpecialJoinInfo *sjinfo)
-{
- UniquePath *pathnode;
- Path sort_path; /* dummy for result of cost_sort */
- Path agg_path; /* dummy for result of cost_agg */
- MemoryContext oldcontext;
- int numCols;
-
- /* Caller made a mistake if subpath isn't cheapest_total ... */
- Assert(subpath == rel->cheapest_total_path);
- Assert(subpath->parent == rel);
- /* ... or if SpecialJoinInfo is the wrong one */
- Assert(sjinfo->jointype == JOIN_SEMI);
- Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
-
- /* If result already cached, return it */
- if (rel->cheapest_unique_path)
- return (UniquePath *) rel->cheapest_unique_path;
-
- /* If it's not possible to unique-ify, return NULL */
- if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
- return NULL;
-
- /*
- * When called during GEQO join planning, we are in a short-lived memory
- * context. We must make sure that the path and any subsidiary data
- * structures created for a baserel survive the GEQO cycle, else the
- * baserel is trashed for future GEQO cycles. On the other hand, when we
- * are creating those for a joinrel during GEQO, we don't want them to
- * clutter the main planning context. Upshot is that the best solution is
- * to explicitly allocate memory in the same context the given RelOptInfo
- * is in.
- */
- oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
-
- pathnode = makeNode(UniquePath);
-
- pathnode->path.pathtype = T_Unique;
- pathnode->path.parent = rel;
- pathnode->path.pathtarget = rel->reltarget;
- pathnode->path.param_info = subpath->param_info;
- pathnode->path.parallel_aware = false;
- pathnode->path.parallel_safe = rel->consider_parallel &&
- subpath->parallel_safe;
- pathnode->path.parallel_workers = subpath->parallel_workers;
-
- /*
- * Assume the output is unsorted, since we don't necessarily have pathkeys
- * to represent it. (This might get overridden below.)
- */
- pathnode->path.pathkeys = NIL;
-
- pathnode->subpath = subpath;
-
- /*
- * Under GEQO and when planning child joins, the sjinfo might be
- * short-lived, so we'd better make copies of data structures we extract
- * from it.
- */
- pathnode->in_operators = copyObject(sjinfo->semi_operators);
- pathnode->uniq_exprs = copyObject(sjinfo->semi_rhs_exprs);
-
- /*
- * If the input is a relation and it has a unique index that proves the
- * semi_rhs_exprs are unique, then we don't need to do anything. Note
- * that relation_has_unique_index_for automatically considers restriction
- * clauses for the rel, as well.
- */
- if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
- relation_has_unique_index_for(root, rel, NIL,
- sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
-
- /*
- * If the input is a subquery whose output must be unique already, then we
- * don't need to do anything. The test for uniqueness has to consider
- * exactly which columns we are extracting; for example "SELECT DISTINCT
- * x,y" doesn't guarantee that x alone is distinct. So we cannot check for
- * this optimization unless semi_rhs_exprs consists only of simple Vars
- * referencing subquery outputs. (Possibly we could do something with
- * expressions in the subquery outputs, too, but for now keep it simple.)
- */
- if (rel->rtekind == RTE_SUBQUERY)
- {
- RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
-
- if (query_supports_distinctness(rte->subquery))
- {
- List *sub_tlist_colnos;
-
- sub_tlist_colnos = translate_sub_tlist(sjinfo->semi_rhs_exprs,
- rel->relid);
-
- if (sub_tlist_colnos &&
- query_is_distinct_for(rte->subquery,
- sub_tlist_colnos,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
- }
- }
-
- /* Estimate number of output rows */
- pathnode->path.rows = estimate_num_groups(root,
- sjinfo->semi_rhs_exprs,
- rel->rows,
- NULL,
- NULL);
- numCols = list_length(sjinfo->semi_rhs_exprs);
-
- if (sjinfo->semi_can_btree)
- {
- /*
- * Estimate cost for sort+unique implementation
- */
- cost_sort(&sort_path, root, NIL,
- subpath->disabled_nodes,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width,
- 0.0,
- work_mem,
- -1.0);
-
- /*
- * Charge one cpu_operator_cost per comparison per input tuple. We
- * assume all columns get compared at most of the tuples. (XXX
- * probably this is an overestimate.) This should agree with
- * create_upper_unique_path.
- */
- sort_path.total_cost += cpu_operator_cost * rel->rows * numCols;
- }
-
- if (sjinfo->semi_can_hash)
- {
- /*
- * Estimate the overhead per hashtable entry at 64 bytes (same as in
- * planner.c).
- */
- int hashentrysize = subpath->pathtarget->width + 64;
-
- if (hashentrysize * pathnode->path.rows > get_hash_memory_limit())
- {
- /*
- * We should not try to hash. Hack the SpecialJoinInfo to
- * remember this, in case we come through here again.
- */
- sjinfo->semi_can_hash = false;
- }
- else
- cost_agg(&agg_path, root,
- AGG_HASHED, NULL,
- numCols, pathnode->path.rows,
- NIL,
- subpath->disabled_nodes,
- subpath->startup_cost,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width);
- }
-
- if (sjinfo->semi_can_btree && sjinfo->semi_can_hash)
- {
- if (agg_path.disabled_nodes < sort_path.disabled_nodes ||
- (agg_path.disabled_nodes == sort_path.disabled_nodes &&
- agg_path.total_cost < sort_path.total_cost))
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- pathnode->umethod = UNIQUE_PATH_SORT;
- }
- else if (sjinfo->semi_can_btree)
- pathnode->umethod = UNIQUE_PATH_SORT;
- else if (sjinfo->semi_can_hash)
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- {
- /* we can get here only if we abandoned hashing above */
- MemoryContextSwitchTo(oldcontext);
- return NULL;
- }
-
- if (pathnode->umethod == UNIQUE_PATH_HASH)
- {
- pathnode->path.disabled_nodes = agg_path.disabled_nodes;
- pathnode->path.startup_cost = agg_path.startup_cost;
- pathnode->path.total_cost = agg_path.total_cost;
- }
- else
- {
- pathnode->path.disabled_nodes = sort_path.disabled_nodes;
- pathnode->path.startup_cost = sort_path.startup_cost;
- pathnode->path.total_cost = sort_path.total_cost;
- }
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
-}
-
/*
* create_gather_merge_path
*
@@ -2003,36 +1761,6 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * translate_sub_tlist - get subquery column numbers represented by tlist
- *
- * The given targetlist usually contains only Vars referencing the given relid.
- * Extract their varattnos (ie, the column numbers of the subquery) and return
- * as an integer List.
- *
- * If any of the tlist items is not a simple Var, we cannot determine whether
- * the subquery's uniqueness condition (if any) matches ours, so punt and
- * return NIL.
- */
-static List *
-translate_sub_tlist(List *tlist, int relid)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, tlist)
- {
- Var *var = (Var *) lfirst(l);
-
- if (!var || !IsA(var, Var) ||
- var->varno != relid)
- return NIL; /* punt */
-
- result = lappend_int(result, var->varattno);
- }
- return result;
-}
-
/*
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
@@ -2790,8 +2518,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3046,8 +2773,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3094,8 +2820,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3171,13 +2896,10 @@ create_group_path(PlannerInfo *root,
}
/*
- * create_upper_unique_path
+ * create_unique_path
* Creates a pathnode that represents performing an explicit Unique step
* on presorted input.
*
- * This produces a Unique plan node, but the use-case is so different from
- * create_unique_path that it doesn't seem worth trying to merge the two.
- *
* 'rel' is the parent relation associated with the result
* 'subpath' is the path representing the source of data
* 'numCols' is the number of grouping columns
@@ -3186,21 +2908,20 @@ create_group_path(PlannerInfo *root,
* The input path must be sorted on the grouping columns, plus possibly
* additional columns; so the first numCols pathkeys are the grouping columns
*/
-UpperUniquePath *
-create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups)
+UniquePath *
+create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups)
{
- UpperUniquePath *pathnode = makeNode(UpperUniquePath);
+ UniquePath *pathnode = makeNode(UniquePath);
pathnode->path.pathtype = T_Unique;
pathnode->path.parent = rel;
/* Unique doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3256,8 +2977,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..469b98009b9 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -217,7 +217,6 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->partial_pathlist = NIL;
rel->cheapest_startup_path = NULL;
rel->cheapest_total_path = NULL;
- rel->cheapest_unique_path = NULL;
rel->cheapest_parameterized_paths = NIL;
rel->relid = relid;
rel->rtekind = rte->rtekind;
@@ -269,6 +268,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->fdw_private = NULL;
rel->unique_for_rels = NIL;
rel->non_unique_for_rels = NIL;
+ rel->unique_rel = NULL;
+ rel->unique_pathkeys = NIL;
rel->baserestrictinfo = NIL;
rel->baserestrictcost.startup = 0;
rel->baserestrictcost.per_tuple = 0;
@@ -713,7 +714,6 @@ build_join_rel(PlannerInfo *root,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
/* init direct_lateral_relids from children; we'll finish it up below */
joinrel->direct_lateral_relids =
@@ -748,6 +748,8 @@ build_join_rel(PlannerInfo *root,
joinrel->fdw_private = NULL;
joinrel->unique_for_rels = NIL;
joinrel->non_unique_for_rels = NIL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -906,7 +908,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
joinrel->direct_lateral_relids = NULL;
joinrel->lateral_relids = NULL;
@@ -933,6 +934,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->useridiscurrent = false;
joinrel->fdwroutine = NULL;
joinrel->fdw_private = NULL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -1488,7 +1491,6 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->pathlist = NIL;
upperrel->cheapest_startup_path = NULL;
upperrel->cheapest_total_path = NULL;
- upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fbe333d88fa..e97566b5938 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -319,8 +319,8 @@ typedef enum JoinType
* These codes are used internally in the planner, but are not supported
* by the executor (nor, indeed, by most of the planner).
*/
- JOIN_UNIQUE_OUTER, /* LHS path must be made unique */
- JOIN_UNIQUE_INNER, /* RHS path must be made unique */
+ JOIN_UNIQUE_OUTER, /* LHS has be made unique */
+ JOIN_UNIQUE_INNER, /* RHS has be made unique */
/*
* We might need additional join types someday.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595d..c7e80113e95 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -924,7 +924,6 @@ typedef struct RelOptInfo
List *partial_pathlist; /* partial Paths */
struct Path *cheapest_startup_path;
struct Path *cheapest_total_path;
- struct Path *cheapest_unique_path;
List *cheapest_parameterized_paths;
/*
@@ -1002,6 +1001,12 @@ typedef struct RelOptInfo
/* known not unique for these set(s) */
List *non_unique_for_rels;
+ /*
+ * information about unique-ification of this relation
+ */
+ struct RelOptInfo *unique_rel;
+ List *unique_pathkeys;
+
/*
* used by various scans and joins:
*/
@@ -1739,8 +1744,8 @@ typedef struct ParamPathInfo
* and the specified outer rel(s).
*
* "rows" is the same as parent->rows in simple paths, but in parameterized
- * paths and UniquePaths it can be less than parent->rows, reflecting the
- * fact that we've filtered by extra join conditions or removed duplicates.
+ * paths it can be less than parent->rows, reflecting the fact that we've
+ * filtered by extra join conditions.
*
* "pathkeys" is a List of PathKey nodes (see above), describing the sort
* ordering of the path's output rows.
@@ -2137,34 +2142,6 @@ typedef struct MemoizePath
* if unknown */
} MemoizePath;
-/*
- * UniquePath represents elimination of distinct rows from the output of
- * its subpath.
- *
- * This can represent significantly different plans: either hash-based or
- * sort-based implementation, or a no-op if the input path can be proven
- * distinct already. The decision is sufficiently localized that it's not
- * worth having separate Path node types. (Note: in the no-op case, we could
- * eliminate the UniquePath node entirely and just return the subpath; but
- * it's convenient to have a UniquePath in the path tree to signal upper-level
- * routines that the input is known distinct.)
- */
-typedef enum UniquePathMethod
-{
- UNIQUE_PATH_NOOP, /* input is known unique already */
- UNIQUE_PATH_HASH, /* use hashing */
- UNIQUE_PATH_SORT, /* use sorting */
-} UniquePathMethod;
-
-typedef struct UniquePath
-{
- Path path;
- Path *subpath;
- UniquePathMethod umethod;
- List *in_operators; /* equality operators of the IN clause */
- List *uniq_exprs; /* expressions to be made unique */
-} UniquePath;
-
/*
* GatherPath runs several copies of a plan in parallel and collects the
* results. The parallel leader may also execute the plan, unless the
@@ -2371,17 +2348,17 @@ typedef struct GroupPath
} GroupPath;
/*
- * UpperUniquePath represents adjacent-duplicate removal (in presorted input)
+ * UniquePath represents adjacent-duplicate removal (in presorted input)
*
* The columns to be compared are the first numkeys columns of the path's
* pathkeys. The input is presumed already sorted that way.
*/
-typedef struct UpperUniquePath
+typedef struct UniquePath
{
Path path;
Path *subpath; /* path representing input source */
int numkeys; /* number of pathkey columns to compare */
-} UpperUniquePath;
+} UniquePath;
/*
* AggPath represents generic computation of aggregate functions
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 60dcdb77e41..71d2945b175 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -91,8 +91,6 @@ extern MemoizePath *create_memoize_path(PlannerInfo *root,
bool singlerow,
bool binary_mode,
double calls);
-extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
- Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath, PathTarget *target,
Relids required_outer, double *rows);
@@ -223,11 +221,11 @@ extern GroupPath *create_group_path(PlannerInfo *root,
List *groupClause,
List *qual,
double numGroups);
-extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups);
+extern UniquePath *create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups);
extern AggPath *create_agg_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 347c582a789..f220e9a270d 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,4 +59,7 @@ extern Path *get_cheapest_fractional_path(RelOptInfo *rel,
extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
+extern RelOptInfo *create_unique_paths(PlannerInfo *root, RelOptInfo *rel,
+ SpecialJoinInfo *sjinfo);
+
#endif /* PLANNER_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index f35a0b18c37..bb1807b4521 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -9226,23 +9226,20 @@ where exists (select 1 from tenk1 t3
---------------------------------------------------------------------------------
Nested Loop
Output: t1.unique1, t2.hundred
- -> Hash Join
+ -> Nested Loop
Output: t1.unique1, t3.tenthous
- Hash Cond: (t3.thousand = t1.unique1)
- -> HashAggregate
+ -> Index Only Scan using onek_unique1 on public.onek t1
+ Output: t1.unique1
+ Index Cond: (t1.unique1 < 1)
+ -> Unique
Output: t3.thousand, t3.tenthous
- Group Key: t3.thousand, t3.tenthous
-> Index Only Scan using tenk1_thous_tenthous on public.tenk1 t3
Output: t3.thousand, t3.tenthous
- -> Hash
- Output: t1.unique1
- -> Index Only Scan using onek_unique1 on public.onek t1
- Output: t1.unique1
- Index Cond: (t1.unique1 < 1)
+ Index Cond: (t3.thousand = t1.unique1)
-> Index Only Scan using tenk1_hundred on public.tenk1 t2
Output: t2.hundred
Index Cond: (t2.hundred = t3.tenthous)
-(18 rows)
+(15 rows)
-- ... unless it actually is unique
create table j3 as select unique1, tenthous from onek;
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index d5368186caa..24e06845f92 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1134,48 +1134,50 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- Join Filter: (t1_2.a = t1_5.b)
- -> HashAggregate
- Group Key: t1_5.b
+ -> Nested Loop
+ Join Filter: (t1_2.a = t1_5.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_5.b
-> Hash Join
Hash Cond: (((t2_1.a + t2_1.b) / 2) = t1_5.b)
-> Seq Scan on prt1_e_p1 t2_1
-> Hash
-> Seq Scan on prt2_p1 t1_5
Filter: (a = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
- Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_3.a = t1_6.b)
- -> HashAggregate
- Group Key: t1_6.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
+ Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_3.a = t1_6.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Join
Hash Cond: (((t2_2.a + t2_2.b) / 2) = t1_6.b)
-> Seq Scan on prt1_e_p2 t2_2
-> Hash
-> Seq Scan on prt2_p2 t1_6
Filter: (a = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
- Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_4.a = t1_7.b)
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
+ Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_4.a = t1_7.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Nested Loop
-> Seq Scan on prt2_p3 t1_7
Filter: (a = 0)
-> Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t2_3
Index Cond: (((a + b) / 2) = t1_7.b)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
- Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
- Filter: (b = 0)
-(41 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
+ Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
+ Filter: (b = 0)
+(43 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
a | b | c
@@ -1190,46 +1192,48 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_6.b
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Semi Join
Hash Cond: (t1_6.b = ((t1_9.a + t1_9.b) / 2))
-> Seq Scan on prt2_p1 t1_6
-> Hash
-> Seq Scan on prt1_e_p1 t1_9
Filter: (c = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
- Index Cond: (a = t1_6.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
+ Index Cond: (a = t1_6.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Hash Semi Join
Hash Cond: (t1_7.b = ((t1_10.a + t1_10.b) / 2))
-> Seq Scan on prt2_p2 t1_7
-> Hash
-> Seq Scan on prt1_e_p2 t1_10
Filter: (c = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
- Index Cond: (a = t1_7.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_8.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
+ Index Cond: (a = t1_7.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_8.b
-> Hash Semi Join
Hash Cond: (t1_8.b = ((t1_11.a + t1_11.b) / 2))
-> Seq Scan on prt2_p3 t1_8
-> Hash
-> Seq Scan on prt1_e_p3 t1_11
Filter: (c = 0)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
- Index Cond: (a = t1_8.b)
- Filter: (b = 0)
-(39 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
+ Index Cond: (a = t1_8.b)
+ Filter: (b = 0)
+(41 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 40d8056fcea..f1d91dfa4ca 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -2612,21 +2612,23 @@ explain (costs off)
SELECT * FROM tenk1 A INNER JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd)
WHERE a.thousand < 750;
- QUERY PLAN
--------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------
Hash Join
Hash Cond: (c.odd = b.odd)
- -> Hash Join
- Hash Cond: (a.hundred = c.hundred)
- -> Seq Scan on tenk1 a
- Filter: (thousand < 750)
- -> Hash
- -> HashAggregate
- Group Key: c.odd, c.hundred
- -> Seq Scan on tenk2 c
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: c.odd, c.hundred
+ -> Seq Scan on tenk2 c
+ -> Memoize
+ Cache Key: c.hundred
+ Cache Mode: logical
+ -> Index Scan using tenk1_hundred on tenk1 a
+ Index Cond: (hundred = c.hundred)
+ Filter: (thousand < 750)
-> Hash
-> Seq Scan on tenk2 b
-(12 rows)
+(14 rows)
-- we can pull up the aggregate sublink into RHS of a left join.
explain (costs off)
@@ -2672,18 +2674,17 @@ EXPLAIN (COSTS OFF)
SELECT * FROM onek
WHERE (unique1,ten) IN (VALUES (1,1), (20,0), (99,9), (17,99))
ORDER BY unique1;
- QUERY PLAN
------------------------------------------------------------------
- Sort
- Sort Key: onek.unique1
- -> Nested Loop
- -> HashAggregate
- Group Key: "*VALUES*".column1, "*VALUES*".column2
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: "*VALUES*".column1, "*VALUES*".column2
-> Values Scan on "*VALUES*"
- -> Index Scan using onek_unique1 on onek
- Index Cond: (unique1 = "*VALUES*".column1)
- Filter: ("*VALUES*".column2 = ten)
-(9 rows)
+ -> Index Scan using onek_unique1 on onek
+ Index Cond: (unique1 = "*VALUES*".column1)
+ Filter: ("*VALUES*".column2 = ten)
+(8 rows)
EXPLAIN (COSTS OFF)
SELECT * FROM onek
@@ -2858,12 +2859,10 @@ SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) ORDER BY 1);
-> Unique
-> Sort
Sort Key: "*VALUES*".column1
- -> Sort
- Sort Key: "*VALUES*".column1
- -> Values Scan on "*VALUES*"
+ -> Values Scan on "*VALUES*"
-> Index Scan using onek_unique1 on onek
Index Cond: (unique1 = "*VALUES*".column1)
-(9 rows)
+(7 rows)
EXPLAIN (COSTS OFF)
SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) LIMIT 1);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a8346cda633..6b715d456c6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3140,7 +3140,6 @@ UnicodeNormalizationForm
UnicodeNormalizationQC
Unique
UniquePath
-UniquePathMethod
UniqueState
UnlistenStmt
UnresolvedTup
@@ -3155,7 +3154,6 @@ UpgradeTaskSlotState
UpgradeTaskStep
UploadManifestCmd
UpperRelationKind
-UpperUniquePath
UserAuth
UserContext
UserMapping
--
2.43.0
On Wed, May 28, 2025 at 10:58 AM Richard Guo <guofenglinux@gmail.com> wrote:
This patch is still a work in progress. Before investing too much
time into it, I'd like to get some feedback on whether it's heading in
the right direction.
Here is an updated version of the patch, which is ready for review.
I've fixed a cost estimation issue, improved some comments, and added
a commit message. Nothing essential has changed.
Thanks
Richard
Attachments:
v2-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchapplication/octet-stream; name=v2-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchDownload
From 62f782bd801c1e8eec841e329ca77c5633a05cc1 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 21 May 2025 12:32:29 +0900
Subject: [PATCH v2] Pathify RHS unique-ification for semijoin planning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
There are two implementation techniques for semijoins: one uses the
JOIN_SEMI jointype, where the executor emits at most one matching row
per left-hand side (LHS) row; the other unique-ifies the right-hand
side (RHS) and then performs a plain inner join.
The latter technique currently has some drawbacks related to the
unique-ification step.
* Only the cheapest-total path of the RHS is considered during
unique-ification. This may cause us to miss some optimization
opportunities; for example, a path with a better sort order might be
overlooked simply because it is not the cheapest in total cost. Such
a path could help avoid a sort at a higher level, potentially
resulting in a cheaper overall plan.
* We currently rely on heuristics to choose between hash-based and
sort-based unique-ification. A better approach would be to generate
paths for both methods and allow add_path() to decide which one is
preferable, consistent with how path selection is handled elsewhere in
the planner.
* In the sort-based implementation, we currently pay no attention to
the pathkeys of the input subpath or the resulting output. This can
result in redundant sort nodes being added to the final plan.
This patch improves semijoin planning by creating a new RelOptInfo for
the RHS rel to represent its unique-ified version. It then generates
multiple paths that represent elimination of distinct rows from the
RHS, using various paths of the original RHS rel as input to the
unique-ification step. Both hash-based and sort-based implementations
are considered. All resulting paths compete in add_path(), and those
deemed worthy of consideration are added to the new rel. Finally, the
unique-ified rel is joined with the other side of the semijoin using a
plain inner join.
As a side effect, most of the code related to the JOIN_UNIQUE_OUTER
and JOIN_UNIQUE_INNER jointypes — used to indicate that the LHS or RHS
path should be made unique — has been removed. Besides, the T_Unique
path now has the same meaning for both semijoins and upper DISTINCT
clauses: it represents adjacent-duplicate removal on presorted input.
This patch unifies their handling by sharing the same data structures
and functions.
---
src/backend/optimizer/README | 3 +-
src/backend/optimizer/path/costsize.c | 6 +-
src/backend/optimizer/path/joinpath.c | 335 ++++--------
src/backend/optimizer/path/joinrels.c | 18 +-
src/backend/optimizer/plan/createplan.c | 292 +----------
src/backend/optimizer/plan/planner.c | 509 ++++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 30 +-
src/backend/optimizer/util/pathnode.c | 306 +----------
src/backend/optimizer/util/relnode.c | 13 +-
src/include/nodes/nodes.h | 4 +-
src/include/nodes/pathnodes.h | 66 ++-
src/include/optimizer/pathnode.h | 12 +-
src/include/optimizer/planner.h | 3 +
src/test/regress/expected/join.out | 17 +-
src/test/regress/expected/partition_join.out | 94 ++--
src/test/regress/expected/subselect.out | 233 ++++++++-
src/test/regress/sql/subselect.sql | 67 +++
src/tools/pgindent/typedefs.list | 2 -
18 files changed, 1037 insertions(+), 973 deletions(-)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..843368096fd 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -640,7 +640,6 @@ RelOptInfo - a relation or joined relations
GroupResultPath - childless Result plan node (used for degenerate grouping)
MaterialPath - a Material plan node
MemoizePath - a Memoize plan node for caching tuples from sub-paths
- UniquePath - remove duplicate rows (either by hashing or sorting)
GatherPath - collect the results of parallel workers
GatherMergePath - collect parallel results, preserving their common sort order
ProjectionPath - a Result plan node with child (used for projection)
@@ -648,7 +647,7 @@ RelOptInfo - a relation or joined relations
SortPath - a Sort plan node applied to some sub-path
IncrementalSortPath - an IncrementalSort plan node applied to some sub-path
GroupPath - a Group plan node applied to some sub-path
- UpperUniquePath - a Unique plan node applied to some sub-path
+ UniquePath - a Unique plan node applied to some sub-path
AggPath - an Agg plan node applied to some sub-path
GroupingSetsPath - an Agg plan node used to implement GROUPING SETS
MinMaxAggPath - a Result plan node with subplans performing MIN/MAX
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 3d44815ed5a..2da6880c152 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3937,7 +3937,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
* The whole issue is moot if we are working from a unique-ified outer
* input, or if we know we don't need to mark/restore at all.
*/
- if (IsA(outer_path, UniquePath) || path->skip_mark_restore)
+ if (IsA(outer_path, UniquePath) ||
+ IsA(outer_path, AggPath) ||
+ path->skip_mark_restore)
rescannedtuples = 0;
else
{
@@ -4332,7 +4334,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
* because we avoid contaminating the cache with a value that's wrong for
* non-unique-ified paths.
*/
- if (IsA(inner_path, UniquePath))
+ if (IsA(inner_path, UniquePath) || IsA(inner_path, AggPath))
{
innerbucketsize = 1.0 / virtualbuckets;
innermcvfreq = 0.0;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 26f0336f1e4..c4248ac9d13 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -112,13 +112,13 @@ static void generate_mergejoin_paths(PlannerInfo *root,
* "flipped around" if we are considering joining the rels in the opposite
* direction from what's indicated in sjinfo.
*
- * Also, this routine and others in this module accept the special JoinTypes
- * JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that we should
- * unique-ify the outer or inner relation and then apply a regular inner
- * join. These values are not allowed to propagate outside this module,
- * however. Path cost estimation code may need to recognize that it's
- * dealing with such a case --- the combination of nominal jointype INNER
- * with sjinfo->jointype == JOIN_SEMI indicates that.
+ * Also, this routine accepts the special JoinTypes JOIN_UNIQUE_OUTER and
+ * JOIN_UNIQUE_INNER to indicate that the outer or inner relation has been
+ * unique-ified and a regular inner join should then be applied. These values
+ * are not allowed to propagate outside this routine, however. Path cost
+ * estimation code may need to recognize that it's dealing with such a case ---
+ * the combination of nominal jointype INNER with sjinfo->jointype == JOIN_SEMI
+ * indicates that.
*/
void
add_paths_to_joinrel(PlannerInfo *root,
@@ -129,6 +129,7 @@ add_paths_to_joinrel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
List *restrictlist)
{
+ JoinType save_jointype = jointype;
JoinPathExtraData extra;
bool mergejoin_allowed = true;
ListCell *lc;
@@ -161,10 +162,10 @@ add_paths_to_joinrel(PlannerInfo *root,
* (else reduce_unique_semijoins would've simplified it), so there's no
* point in calling innerrel_is_unique. However, if the LHS covers all of
* the semijoin's min_lefthand, then it's appropriate to set inner_unique
- * because the path produced by create_unique_path will be unique relative
- * to the LHS. (If we have an LHS that's only part of the min_lefthand,
- * that is *not* true.) For JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid
- * letting that value escape this module.
+ * because the unique relation produced by create_unique_paths will be
+ * unique relative to the LHS. (If we have an LHS that's only part of the
+ * min_lefthand, that is *not* true.) For JOIN_UNIQUE_OUTER, pass
+ * JOIN_INNER to avoid letting that value escape this module.
*/
switch (jointype)
{
@@ -201,6 +202,13 @@ add_paths_to_joinrel(PlannerInfo *root,
break;
}
+ /*
+ * If the outer or inner relation has been unique-ified, handle as a plain
+ * inner join.
+ */
+ if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
+ jointype = JOIN_INNER;
+
/*
* Find potential mergejoin clauses. We can skip this if we are not
* interested in doing a mergejoin. However, mergejoin may be our only
@@ -331,7 +339,7 @@ add_paths_to_joinrel(PlannerInfo *root,
joinrel->fdwroutine->GetForeignJoinPaths)
joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
/*
* 6. Finally, give extensions a chance to manipulate the path list. They
@@ -341,7 +349,7 @@ add_paths_to_joinrel(PlannerInfo *root,
*/
if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
}
/*
@@ -1364,7 +1372,6 @@ sort_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *outer_path;
Path *inner_path;
Path *cheapest_partial_outer = NULL;
@@ -1402,38 +1409,16 @@ sort_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(inner_path, outerrel))
return;
- /*
- * If unique-ification is requested, do it and then handle as a plain
- * inner join.
- */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- outer_path = (Path *) create_unique_path(root, outerrel,
- outer_path, extra->sjinfo);
- Assert(outer_path);
- jointype = JOIN_INNER;
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- inner_path = (Path *) create_unique_path(root, innerrel,
- inner_path, extra->sjinfo);
- Assert(inner_path);
- jointype = JOIN_INNER;
- }
-
/*
* If the joinrel is parallel-safe, we may be able to consider a partial
- * merge join. However, we can't handle JOIN_UNIQUE_OUTER, because the
- * outer path will be partial, and therefore we won't be able to properly
- * guarantee uniqueness. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT
- * and JOIN_RIGHT_ANTI, because they can produce false null extended rows.
+ * merge join. However, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * JOIN_RIGHT_ANTI, because they can produce false null extended rows.
* Also, the resulting path must not be parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -1441,7 +1426,7 @@ sort_inner_and_outer(PlannerInfo *root,
if (inner_path->parallel_safe)
cheapest_safe_inner = inner_path;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
@@ -1580,13 +1565,9 @@ generate_mergejoin_paths(PlannerInfo *root,
List *trialsortkeys;
Path *cheapest_startup_inner;
Path *cheapest_total_inner;
- JoinType save_jointype = jointype;
int num_sortkeys;
int sortkeycnt;
- if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/* Look for useful mergeclauses (if any) */
mergeclauses =
find_mergeclauses_for_outer_pathkeys(root,
@@ -1636,10 +1617,6 @@ generate_mergejoin_paths(PlannerInfo *root,
extra,
is_partial);
- /* Can't do anything else if inner path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
/*
* Look for presorted inner paths that satisfy the innersortkey list ---
* or any truncation thereof, if we are allowed to build a mergejoin using
@@ -1819,7 +1796,6 @@ match_unsorted_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool nestjoinOK;
bool useallclauses;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
@@ -1855,12 +1831,6 @@ match_unsorted_outer(PlannerInfo *root,
nestjoinOK = false;
useallclauses = true;
break;
- case JOIN_UNIQUE_OUTER:
- case JOIN_UNIQUE_INNER:
- jointype = JOIN_INNER;
- nestjoinOK = true;
- useallclauses = false;
- break;
default:
elog(ERROR, "unrecognized join type: %d",
(int) jointype);
@@ -1877,20 +1847,7 @@ match_unsorted_outer(PlannerInfo *root,
if (PATH_PARAM_BY_REL(inner_cheapest_total, outerrel))
inner_cheapest_total = NULL;
- /*
- * If we need to unique-ify the inner path, we will consider only the
- * cheapest-total inner.
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /* No way to do this with an inner path parameterized by outer rel */
- if (inner_cheapest_total == NULL)
- return;
- inner_cheapest_total = (Path *)
- create_unique_path(root, innerrel, inner_cheapest_total, extra->sjinfo);
- Assert(inner_cheapest_total);
- }
- else if (nestjoinOK)
+ if (nestjoinOK)
{
/*
* Consider materializing the cheapest inner path, unless
@@ -1914,20 +1871,6 @@ match_unsorted_outer(PlannerInfo *root,
if (PATH_PARAM_BY_REL(outerpath, innerrel))
continue;
- /*
- * If we need to unique-ify the outer path, it's pointless to consider
- * any but the cheapest outer. (XXX we don't consider parameterized
- * outers, nor inners, for unique-ified cases. Should we?)
- */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- {
- if (outerpath != outerrel->cheapest_total_path)
- continue;
- outerpath = (Path *) create_unique_path(root, outerrel,
- outerpath, extra->sjinfo);
- Assert(outerpath);
- }
-
/*
* The result will have this sort order (even if it is implemented as
* a nestloop, and even if some of the mergeclauses are implemented by
@@ -1936,21 +1879,7 @@ match_unsorted_outer(PlannerInfo *root,
merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
outerpath->pathkeys);
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /*
- * Consider nestloop join, but only with the unique-ified cheapest
- * inner path
- */
- try_nestloop_path(root,
- joinrel,
- outerpath,
- inner_cheapest_total,
- merge_pathkeys,
- jointype,
- extra);
- }
- else if (nestjoinOK)
+ if (nestjoinOK)
{
/*
* Consider nestloop joins using this outer path and various
@@ -2001,17 +1930,13 @@ match_unsorted_outer(PlannerInfo *root,
extra);
}
- /* Can't do anything else if outer path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- continue;
-
/* Can't do anything else if inner rel is parameterized by outer */
if (inner_cheapest_total == NULL)
continue;
/* Generate merge join paths */
generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
- save_jointype, extra, useallclauses,
+ jointype, extra, useallclauses,
inner_cheapest_total, merge_pathkeys,
false);
}
@@ -2019,41 +1944,35 @@ match_unsorted_outer(PlannerInfo *root,
/*
* Consider partial nestloop and mergejoin plan if outerrel has any
* partial path and the joinrel is parallel-safe. However, we can't
- * handle JOIN_UNIQUE_OUTER, because the outer path will be partial, and
- * therefore we won't be able to properly guarantee uniqueness. Nor can
- * we handle joins needing lateral rels, since partial paths must not be
- * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * handle joins needing lateral rels, since partial paths must not be
+ * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
* JOIN_RIGHT_ANTI, because they can produce false null extended rows.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
if (nestjoinOK)
consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
- save_jointype, extra);
+ jointype, extra);
/*
* If inner_cheapest_total is NULL or non parallel-safe then find the
- * cheapest total parallel safe path. If doing JOIN_UNIQUE_INNER, we
- * can't use any alternative inner path.
+ * cheapest total parallel safe path.
*/
if (inner_cheapest_total == NULL ||
!inner_cheapest_total->parallel_safe)
{
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
- inner_cheapest_total = get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
+ inner_cheapest_total =
+ get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
if (inner_cheapest_total)
consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
- save_jointype, extra,
+ jointype, extra,
inner_cheapest_total);
}
}
@@ -2118,24 +2037,17 @@ consider_parallel_nestloop(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
Path *matpath = NULL;
ListCell *lc1;
- if (jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/*
- * Consider materializing the cheapest inner path, unless: 1) we're doing
- * JOIN_UNIQUE_INNER, because in this case we have to unique-ify the
- * cheapest inner path, 2) enable_material is off, 3) the cheapest inner
- * path is not parallel-safe, 4) the cheapest inner path is parameterized
- * by the outer rel, or 5) the cheapest inner path materializes its output
- * anyway.
+ * Consider materializing the cheapest inner path, unless: 1)
+ * enable_material is off, 2) the cheapest inner path is not
+ * parallel-safe, 3) the cheapest inner path is parameterized by the outer
+ * rel, or 4) the cheapest inner path materializes its output anyway.
*/
- if (save_jointype != JOIN_UNIQUE_INNER &&
- enable_material && inner_cheapest_total->parallel_safe &&
+ if (enable_material && inner_cheapest_total->parallel_safe &&
!PATH_PARAM_BY_REL(inner_cheapest_total, outerrel) &&
!ExecMaterializesOutput(inner_cheapest_total->pathtype))
{
@@ -2169,23 +2081,6 @@ consider_parallel_nestloop(PlannerInfo *root,
if (!innerpath->parallel_safe)
continue;
- /*
- * If we're doing JOIN_UNIQUE_INNER, we can only use the inner's
- * cheapest_total_path, and we have to unique-ify it. (We might
- * be able to relax this to allow other safe, unparameterized
- * inner paths, but right now create_unique_path is not on board
- * with that.)
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- if (innerpath != innerrel->cheapest_total_path)
- continue;
- innerpath = (Path *) create_unique_path(root, innerrel,
- innerpath,
- extra->sjinfo);
- Assert(innerpath);
- }
-
try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
pathkeys, jointype, extra);
@@ -2227,7 +2122,6 @@ hash_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool isouterjoin = IS_OUTER_JOIN(jointype);
List *hashclauses;
ListCell *l;
@@ -2290,6 +2184,8 @@ hash_inner_and_outer(PlannerInfo *root,
Path *cheapest_startup_outer = outerrel->cheapest_startup_path;
Path *cheapest_total_outer = outerrel->cheapest_total_path;
Path *cheapest_total_inner = innerrel->cheapest_total_path;
+ ListCell *lc1;
+ ListCell *lc2;
/*
* If either cheapest-total path is parameterized by the other rel, we
@@ -2301,114 +2197,64 @@ hash_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
return;
- /* Unique-ify if need be; we ignore parameterized possibilities */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- cheapest_total_outer = (Path *)
- create_unique_path(root, outerrel,
- cheapest_total_outer, extra->sjinfo);
- Assert(cheapest_total_outer);
- jointype = JOIN_INNER;
- try_hashjoin_path(root,
- joinrel,
- cheapest_total_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- /* no possibility of cheap startup here */
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- cheapest_total_inner = (Path *)
- create_unique_path(root, innerrel,
- cheapest_total_inner, extra->sjinfo);
- Assert(cheapest_total_inner);
- jointype = JOIN_INNER;
+ /*
+ * Consider the cheapest startup outer together with the cheapest
+ * total inner, and then consider pairings of cheapest-total paths
+ * including parameterized ones. There is no use in generating
+ * parameterized paths on the basis of possibly cheap startup cost, so
+ * this is sufficient.
+ */
+ if (cheapest_startup_outer != NULL)
try_hashjoin_path(root,
joinrel,
- cheapest_total_outer,
+ cheapest_startup_outer,
cheapest_total_inner,
hashclauses,
jointype,
extra);
- if (cheapest_startup_outer != NULL &&
- cheapest_startup_outer != cheapest_total_outer)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- }
- else
+
+ foreach(lc1, outerrel->cheapest_parameterized_paths)
{
+ Path *outerpath = (Path *) lfirst(lc1);
+
/*
- * For other jointypes, we consider the cheapest startup outer
- * together with the cheapest total inner, and then consider
- * pairings of cheapest-total paths including parameterized ones.
- * There is no use in generating parameterized paths on the basis
- * of possibly cheap startup cost, so this is sufficient.
+ * We cannot use an outer path that is parameterized by the inner
+ * rel.
*/
- ListCell *lc1;
- ListCell *lc2;
-
- if (cheapest_startup_outer != NULL)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
+ if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ continue;
- foreach(lc1, outerrel->cheapest_parameterized_paths)
+ foreach(lc2, innerrel->cheapest_parameterized_paths)
{
- Path *outerpath = (Path *) lfirst(lc1);
+ Path *innerpath = (Path *) lfirst(lc2);
/*
- * We cannot use an outer path that is parameterized by the
- * inner rel.
+ * We cannot use an inner path that is parameterized by the
+ * outer rel, either.
*/
- if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ if (PATH_PARAM_BY_REL(innerpath, outerrel))
continue;
- foreach(lc2, innerrel->cheapest_parameterized_paths)
- {
- Path *innerpath = (Path *) lfirst(lc2);
-
- /*
- * We cannot use an inner path that is parameterized by
- * the outer rel, either.
- */
- if (PATH_PARAM_BY_REL(innerpath, outerrel))
- continue;
+ if (outerpath == cheapest_startup_outer &&
+ innerpath == cheapest_total_inner)
+ continue; /* already tried it */
- if (outerpath == cheapest_startup_outer &&
- innerpath == cheapest_total_inner)
- continue; /* already tried it */
-
- try_hashjoin_path(root,
- joinrel,
- outerpath,
- innerpath,
- hashclauses,
- jointype,
- extra);
- }
+ try_hashjoin_path(root,
+ joinrel,
+ outerpath,
+ innerpath,
+ hashclauses,
+ jointype,
+ extra);
}
}
/*
* If the joinrel is parallel-safe, we may be able to consider a
- * partial hash join. However, we can't handle JOIN_UNIQUE_OUTER,
- * because the outer path will be partial, and therefore we won't be
- * able to properly guarantee uniqueness. Also, the resulting path
- * must not be parameterized.
+ * partial hash join. However, the resulting path must not be
+ * parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -2421,11 +2267,9 @@ hash_inner_and_outer(PlannerInfo *root,
/*
* Can we use a partial inner plan too, so that we can build a
- * shared hash table in parallel? We can't handle
- * JOIN_UNIQUE_INNER because we can't guarantee uniqueness.
+ * shared hash table in parallel?
*/
if (innerrel->partial_pathlist != NIL &&
- save_jointype != JOIN_UNIQUE_INNER &&
enable_parallel_hash)
{
cheapest_partial_inner =
@@ -2441,19 +2285,18 @@ hash_inner_and_outer(PlannerInfo *root,
* Normally, given that the joinrel is parallel-safe, the cheapest
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
- * path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, right-semi or right-anti join, we
- * can't use parallelism (building the hash table in each backend)
+ * path. If full, right, right-semi or right-anti join, we can't
+ * use parallelism (building the hash table in each backend)
* because no one process has all the match bits.
*/
- if (save_jointype == JOIN_FULL ||
- save_jointype == JOIN_RIGHT ||
- save_jointype == JOIN_RIGHT_SEMI ||
- save_jointype == JOIN_RIGHT_ANTI)
+ if (jointype == JOIN_FULL ||
+ jointype == JOIN_RIGHT ||
+ jointype == JOIN_RIGHT_SEMI ||
+ jointype == JOIN_RIGHT_ANTI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 60d65762b5d..fec359c28f6 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -19,6 +19,7 @@
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
+#include "optimizer/planner.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
@@ -444,8 +445,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel2, sjinfo) != NULL)
{
/*----------
* For a semijoin, we can join the RHS to anything else by
@@ -477,8 +477,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel1->relids) &&
- create_unique_path(root, rel1, rel1->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel1, sjinfo) != NULL)
{
/* Reversed semijoin case */
if (match_sjinfo)
@@ -895,6 +894,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist)
{
+ RelOptInfo *unique_rel2;
+
/*
* Consider paths using each rel as both outer and inner. Depending on
* the join type, a provably empty outer or inner rel might mean the join
@@ -1000,14 +1001,13 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
/*
* If we know how to unique-ify the RHS and one input rel is
* exactly the RHS (not a superset) we can consider unique-ifying
- * it and then doing a regular join. (The create_unique_path
+ * it and then doing a regular join. (The create_unique_paths
* check here is probably redundant with what join_is_legal did,
* but if so the check is cheap because it's cached. So test
* anyway to be sure.)
*/
if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ (unique_rel2 = create_unique_paths(root, rel2, sjinfo)) != NULL)
{
if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
restriction_is_constant_false(restrictlist, joinrel, false))
@@ -1015,10 +1015,10 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
mark_dummy_rel(joinrel);
break;
}
- add_paths_to_joinrel(root, joinrel, rel1, rel2,
+ add_paths_to_joinrel(root, joinrel, rel1, unique_rel2,
JOIN_UNIQUE_INNER, sjinfo,
restrictlist);
- add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ add_paths_to_joinrel(root, joinrel, unique_rel2, rel1,
JOIN_UNIQUE_OUTER, sjinfo,
restrictlist);
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 4ad30b7627e..1658b20f17e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -95,8 +95,6 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
int flags);
static Memoize *create_memoize_plan(PlannerInfo *root, MemoizePath *best_path,
int flags);
-static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
- int flags);
static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
static Plan *create_projection_plan(PlannerInfo *root,
ProjectionPath *best_path,
@@ -106,8 +104,7 @@ static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
static IncrementalSort *create_incrementalsort_plan(PlannerInfo *root,
IncrementalSortPath *best_path, int flags);
static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
-static Unique *create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path,
- int flags);
+static Unique *create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags);
static Agg *create_agg_plan(PlannerInfo *root, AggPath *best_path);
static Plan *create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path);
static Result *create_minmaxagg_plan(PlannerInfo *root, MinMaxAggPath *best_path);
@@ -293,9 +290,9 @@ static WindowAgg *make_windowagg(List *tlist, WindowClause *wc,
static Group *make_group(List *tlist, List *qual, int numGroupCols,
AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations,
Plan *lefttree);
-static Unique *make_unique_from_sortclauses(Plan *lefttree, List *distinctList);
static Unique *make_unique_from_pathkeys(Plan *lefttree,
- List *pathkeys, int numCols);
+ List *pathkeys, int numCols,
+ Relids relids);
static Gather *make_gather(List *qptlist, List *qpqual,
int nworkers, int rescan_param, bool single_copy, Plan *subplan);
static SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy,
@@ -467,19 +464,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
flags);
break;
case T_Unique:
- if (IsA(best_path, UpperUniquePath))
- {
- plan = (Plan *) create_upper_unique_plan(root,
- (UpperUniquePath *) best_path,
- flags);
- }
- else
- {
- Assert(IsA(best_path, UniquePath));
- plan = create_unique_plan(root,
- (UniquePath *) best_path,
- flags);
- }
+ plan = (Plan *) create_unique_plan(root,
+ (UniquePath *) best_path,
+ flags);
break;
case T_Gather:
plan = (Plan *) create_gather_plan(root,
@@ -1710,207 +1697,6 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags)
return plan;
}
-/*
- * create_unique_plan
- * Create a Unique plan for 'best_path' and (recursively) plans
- * for its subpaths.
- *
- * Returns a Plan node.
- */
-static Plan *
-create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
-{
- Plan *plan;
- Plan *subplan;
- List *in_operators;
- List *uniq_exprs;
- List *newtlist;
- int nextresno;
- bool newitems;
- int numGroupCols;
- AttrNumber *groupColIdx;
- Oid *groupCollations;
- int groupColPos;
- ListCell *l;
-
- /* Unique doesn't project, so tlist requirements pass through */
- subplan = create_plan_recurse(root, best_path->subpath, flags);
-
- /* Done if we don't need to do any actual unique-ifying */
- if (best_path->umethod == UNIQUE_PATH_NOOP)
- return subplan;
-
- /*
- * As constructed, the subplan has a "flat" tlist containing just the Vars
- * needed here and at upper levels. The values we are supposed to
- * unique-ify may be expressions in these variables. We have to add any
- * such expressions to the subplan's tlist.
- *
- * The subplan may have a "physical" tlist if it is a simple scan plan. If
- * we're going to sort, this should be reduced to the regular tlist, so
- * that we don't sort more data than we need to. For hashing, the tlist
- * should be left as-is if we don't need to add any expressions; but if we
- * do have to add expressions, then a projection step will be needed at
- * runtime anyway, so we may as well remove unneeded items. Therefore
- * newtlist starts from build_path_tlist() not just a copy of the
- * subplan's tlist; and we don't install it into the subplan unless we are
- * sorting or stuff has to be added.
- */
- in_operators = best_path->in_operators;
- uniq_exprs = best_path->uniq_exprs;
-
- /* initialize modified subplan tlist as just the "required" vars */
- newtlist = build_path_tlist(root, &best_path->path);
- nextresno = list_length(newtlist) + 1;
- newitems = false;
-
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle)
- {
- tle = makeTargetEntry((Expr *) uniqexpr,
- nextresno,
- NULL,
- false);
- newtlist = lappend(newtlist, tle);
- nextresno++;
- newitems = true;
- }
- }
-
- /* Use change_plan_targetlist in case we need to insert a Result node */
- if (newitems || best_path->umethod == UNIQUE_PATH_SORT)
- subplan = change_plan_targetlist(subplan, newtlist,
- best_path->path.parallel_safe);
-
- /*
- * Build control information showing which subplan output columns are to
- * be examined by the grouping step. Unfortunately we can't merge this
- * with the previous loop, since we didn't then know which version of the
- * subplan tlist we'd end up using.
- */
- newtlist = subplan->targetlist;
- numGroupCols = list_length(uniq_exprs);
- groupColIdx = (AttrNumber *) palloc(numGroupCols * sizeof(AttrNumber));
- groupCollations = (Oid *) palloc(numGroupCols * sizeof(Oid));
-
- groupColPos = 0;
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle) /* shouldn't happen */
- elog(ERROR, "failed to find unique expression in subplan tlist");
- groupColIdx[groupColPos] = tle->resno;
- groupCollations[groupColPos] = exprCollation((Node *) tle->expr);
- groupColPos++;
- }
-
- if (best_path->umethod == UNIQUE_PATH_HASH)
- {
- Oid *groupOperators;
-
- /*
- * Get the hashable equality operators for the Agg node to use.
- * Normally these are the same as the IN clause operators, but if
- * those are cross-type operators then the equality operators are the
- * ones for the IN clause operators' RHS datatype.
- */
- groupOperators = (Oid *) palloc(numGroupCols * sizeof(Oid));
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid eq_oper;
-
- if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
- elog(ERROR, "could not find compatible hash operator for operator %u",
- in_oper);
- groupOperators[groupColPos++] = eq_oper;
- }
-
- /*
- * Since the Agg node is going to project anyway, we can give it the
- * minimum output tlist, without any stuff we might have added to the
- * subplan tlist.
- */
- plan = (Plan *) make_agg(build_path_tlist(root, &best_path->path),
- NIL,
- AGG_HASHED,
- AGGSPLIT_SIMPLE,
- numGroupCols,
- groupColIdx,
- groupOperators,
- groupCollations,
- NIL,
- NIL,
- best_path->path.rows,
- 0,
- subplan);
- }
- else
- {
- List *sortList = NIL;
- Sort *sort;
-
- /* Create an ORDER BY list to sort the input compatibly */
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid sortop;
- Oid eqop;
- TargetEntry *tle;
- SortGroupClause *sortcl;
-
- sortop = get_ordering_op_for_equality_op(in_oper, false);
- if (!OidIsValid(sortop)) /* shouldn't happen */
- elog(ERROR, "could not find ordering operator for equality operator %u",
- in_oper);
-
- /*
- * The Unique node will need equality operators. Normally these
- * are the same as the IN clause operators, but if those are
- * cross-type operators then the equality operators are the ones
- * for the IN clause operators' RHS datatype.
- */
- eqop = get_equality_op_for_ordering_op(sortop, NULL);
- if (!OidIsValid(eqop)) /* shouldn't happen */
- elog(ERROR, "could not find equality operator for ordering operator %u",
- sortop);
-
- tle = get_tle_by_resno(subplan->targetlist,
- groupColIdx[groupColPos]);
- Assert(tle != NULL);
-
- sortcl = makeNode(SortGroupClause);
- sortcl->tleSortGroupRef = assignSortGroupRef(tle,
- subplan->targetlist);
- sortcl->eqop = eqop;
- sortcl->sortop = sortop;
- sortcl->reverse_sort = false;
- sortcl->nulls_first = false;
- sortcl->hashable = false; /* no need to make this accurate */
- sortList = lappend(sortList, sortcl);
- groupColPos++;
- }
- sort = make_sort_from_sortclauses(sortList, subplan);
- label_sort_with_costsize(root, sort, -1.0);
- plan = (Plan *) make_unique_from_sortclauses((Plan *) sort, sortList);
- }
-
- /* Copy cost data from Path to Plan */
- copy_generic_path_info(plan, &best_path->path);
-
- return plan;
-}
-
/*
* create_gather_plan
*
@@ -2268,13 +2054,13 @@ create_group_plan(PlannerInfo *root, GroupPath *best_path)
}
/*
- * create_upper_unique_plan
+ * create_unique_plan
*
* Create a Unique plan for 'best_path' and (recursively) plans
* for its subpaths.
*/
static Unique *
-create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flags)
+create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
{
Unique *plan;
Plan *subplan;
@@ -2288,7 +2074,8 @@ create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flag
plan = make_unique_from_pathkeys(subplan,
best_path->path.pathkeys,
- best_path->numkeys);
+ best_path->numkeys,
+ best_path->path.parent->relids);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -6761,61 +6548,12 @@ make_group(List *tlist,
}
/*
- * distinctList is a list of SortGroupClauses, identifying the targetlist items
- * that should be considered by the Unique filter. The input path must
- * already be sorted accordingly.
- */
-static Unique *
-make_unique_from_sortclauses(Plan *lefttree, List *distinctList)
-{
- Unique *node = makeNode(Unique);
- Plan *plan = &node->plan;
- int numCols = list_length(distinctList);
- int keyno = 0;
- AttrNumber *uniqColIdx;
- Oid *uniqOperators;
- Oid *uniqCollations;
- ListCell *slitem;
-
- plan->targetlist = lefttree->targetlist;
- plan->qual = NIL;
- plan->lefttree = lefttree;
- plan->righttree = NULL;
-
- /*
- * convert SortGroupClause list into arrays of attr indexes and equality
- * operators, as wanted by executor
- */
- Assert(numCols > 0);
- uniqColIdx = (AttrNumber *) palloc(sizeof(AttrNumber) * numCols);
- uniqOperators = (Oid *) palloc(sizeof(Oid) * numCols);
- uniqCollations = (Oid *) palloc(sizeof(Oid) * numCols);
-
- foreach(slitem, distinctList)
- {
- SortGroupClause *sortcl = (SortGroupClause *) lfirst(slitem);
- TargetEntry *tle = get_sortgroupclause_tle(sortcl, plan->targetlist);
-
- uniqColIdx[keyno] = tle->resno;
- uniqOperators[keyno] = sortcl->eqop;
- uniqCollations[keyno] = exprCollation((Node *) tle->expr);
- Assert(OidIsValid(uniqOperators[keyno]));
- keyno++;
- }
-
- node->numCols = numCols;
- node->uniqColIdx = uniqColIdx;
- node->uniqOperators = uniqOperators;
- node->uniqCollations = uniqCollations;
-
- return node;
-}
-
-/*
- * as above, but use pathkeys to identify the sort columns and semantics
+ * pathkeys is a list of PathKeys, identifying the sort columns and semantics.
+ * The input path must already be sorted accordingly.
*/
static Unique *
-make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
+make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols,
+ Relids relids)
{
Unique *node = makeNode(Unique);
Plan *plan = &node->plan;
@@ -6878,7 +6616,7 @@ make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
foreach(j, plan->targetlist)
{
tle = (TargetEntry *) lfirst(j);
- em = find_ec_member_matching_expr(ec, tle->expr, NULL);
+ em = find_ec_member_matching_expr(ec, tle->expr, relids);
if (em)
{
/* found expr already in tlist */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ff65867eebe..a7daf757c9e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -267,6 +267,12 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
+static void create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
+static void create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
/*****************************************************************************
@@ -4917,10 +4923,10 @@ create_partial_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_partial_path(partial_distinct_rel, (Path *)
- create_upper_unique_path(root, partial_distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, partial_distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -5111,10 +5117,10 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_path(distinct_rel, (Path *)
- create_upper_unique_path(root, distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -8248,3 +8254,490 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
return grouplist;
}
+
+/*
+ * create_unique_paths
+ * Build a new RelOptInfo containing Paths that represent elimination of
+ * distinct rows from the input data. Distinct-ness is defined according to
+ * the needs of the semijoin represented by sjinfo. If it is not possible
+ * to identify how to make the data unique, NULL is returned.
+ *
+ * If used at all, this is likely to be called repeatedly on the same rel;
+ * So we cache the result.
+ */
+RelOptInfo *
+create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *unique_rel;
+ List *sortPathkeys = NIL;
+ List *groupClause = NIL;
+ MemoryContext oldcontext;
+
+ /* Caller made a mistake if SpecialJoinInfo is the wrong one */
+ Assert(sjinfo->jointype == JOIN_SEMI);
+ Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
+
+ /* If result already cached, return it */
+ if (rel->unique_rel)
+ return rel->unique_rel;
+
+ /* If it's not possible to unique-ify, return NULL */
+ if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
+ return NULL;
+
+ /*
+ * When called during GEQO join planning, we are in a short-lived memory
+ * context. We must make sure that the unique rel and any subsidiary data
+ * structures created for a baserel survive the GEQO cycle, else the
+ * baserel is trashed for future GEQO cycles. On the other hand, when we
+ * are creating those for a joinrel during GEQO, we don't want them to
+ * clutter the main planning context. Upshot is that the best solution is
+ * to explicitly allocate memory in the same context the given RelOptInfo
+ * is in.
+ */
+ oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
+
+ unique_rel = makeNode(RelOptInfo);
+ memcpy(unique_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ unique_rel->pathlist = NIL;
+ unique_rel->ppilist = NIL;
+ unique_rel->partial_pathlist = NIL;
+ unique_rel->cheapest_startup_path = NULL;
+ unique_rel->cheapest_total_path = NULL;
+ unique_rel->cheapest_parameterized_paths = NIL;
+
+ /* Estimate number of output rows */
+ unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ rel->rows,
+ NULL,
+ NULL);
+
+ /*
+ * Build the target list for the unique rel. We also build the pathkeys
+ * that represent the ordering requirements for the sort-based
+ * implementation, and the list of SortGroupClause nodes that represent
+ * the columns to be grouped on for the hash-based implementation.
+ *
+ * For a child rel, we can construct these fields from those of its
+ * parent.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ PathTarget *child_unique_target;
+ PathTarget *parent_unique_target;
+
+ parent_unique_target = rel->top_parent->unique_rel->reltarget;
+
+ child_unique_target = copy_pathtarget(parent_unique_target);
+
+ /* Translate the target expressions */
+ child_unique_target->exprs = (List *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) parent_unique_target->exprs,
+ rel,
+ rel->top_parent);
+
+ unique_rel->reltarget = child_unique_target;
+
+ sortPathkeys = rel->top_parent->unique_pathkeys;
+ groupClause = rel->top_parent->unique_groupclause;
+ }
+ else
+ {
+ List *newtlist;
+ int nextresno;
+ List *sortList = NIL;
+ ListCell *lc1;
+ ListCell *lc2;
+
+ /*
+ * The values we are supposed to unique-ify may be expressions in the
+ * variables of the input rel's targetlist. We have to add any such
+ * expressions to the unique rel's targetlist.
+ *
+ * While in the loop, build the lists of SortGroupClause's that
+ * represent the ordering for the sort-based implementation and the
+ * grouping for the hash-based implementation.
+ */
+ newtlist = make_tlist_from_pathtarget(rel->reltarget);
+ nextresno = list_length(newtlist) + 1;
+
+ forboth(lc1, sjinfo->semi_rhs_exprs, lc2, sjinfo->semi_operators)
+ {
+ Expr *uniqexpr = lfirst(lc1);
+ Oid in_oper = lfirst_oid(lc2);
+ Oid sortop = InvalidOid;
+ TargetEntry *tle;
+
+ tle = tlist_member(uniqexpr, newtlist);
+ if (!tle)
+ {
+ tle = makeTargetEntry((Expr *) uniqexpr,
+ nextresno,
+ NULL,
+ false);
+ newtlist = lappend(newtlist, tle);
+ nextresno++;
+ }
+
+ if (sjinfo->semi_can_btree)
+ {
+ /* Create an ORDER BY list to sort the input compatibly */
+ Oid eqop;
+ SortGroupClause *sortcl;
+
+ sortop = get_ordering_op_for_equality_op(in_oper, false);
+ if (!OidIsValid(sortop)) /* shouldn't happen */
+ elog(ERROR, "could not find ordering operator for equality operator %u",
+ in_oper);
+
+ /*
+ * The Unique node will need equality operators. Normally
+ * these are the same as the IN clause operators, but if those
+ * are cross-type operators then the equality operators are
+ * the ones for the IN clause operators' RHS datatype.
+ */
+ eqop = get_equality_op_for_ordering_op(sortop, NULL);
+ if (!OidIsValid(eqop)) /* shouldn't happen */
+ elog(ERROR, "could not find equality operator for ordering operator %u",
+ sortop);
+
+ sortcl = makeNode(SortGroupClause);
+ sortcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ sortcl->eqop = eqop;
+ sortcl->sortop = sortop;
+ sortcl->reverse_sort = false;
+ sortcl->nulls_first = false;
+ sortcl->hashable = false; /* no need to make this accurate */
+ sortList = lappend(sortList, sortcl);
+ }
+ if (sjinfo->semi_can_hash)
+ {
+ /* Create a GROUP BY list for the Agg node to use */
+ Oid eq_oper;
+ SortGroupClause *groupcl;
+
+ /*
+ * Get the hashable equality operators for the Agg node to
+ * use. Normally these are the same as the IN clause
+ * operators, but if those are cross-type operators then the
+ * equality operators are the ones for the IN clause
+ * operators' RHS datatype.
+ */
+ if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
+ elog(ERROR, "could not find compatible hash operator for operator %u",
+ in_oper);
+
+ groupcl = makeNode(SortGroupClause);
+ groupcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ groupcl->eqop = eq_oper;
+ groupcl->sortop = sortop;
+ groupcl->reverse_sort = false;
+ groupcl->nulls_first = false;
+ groupcl->hashable = true;
+ groupClause = lappend(groupClause, groupcl);
+ }
+ }
+
+ unique_rel->reltarget = create_pathtarget(root, newtlist);
+ sortPathkeys = make_pathkeys_for_sortclauses(root, sortList, newtlist);
+ }
+
+ /* build unique paths based on input rel's pathlist */
+ create_final_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* build unique paths based on input rel's partial_pathlist */
+ create_partial_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* Now choose the best path(s) */
+ set_cheapest(unique_rel);
+
+ /*
+ * There shouldn't be any partial paths for the unique relation;
+ * otherwise, we won't be able to properly guarantee uniqueness.
+ */
+ Assert(unique_rel->partial_pathlist == NIL);
+
+ /* Cache the result */
+ rel->unique_rel = unique_rel;
+ rel->unique_pathkeys = sortPathkeys;
+ rel->unique_groupclause = groupClause;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return unique_rel;
+}
+
+/*
+ * create_final_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' pathlist
+ */
+static void
+create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ */
+ foreach(lc, input_rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ input_path,
+ unique_rel->reltarget);
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != input_rel->cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, unique_rel, path,
+ list_length(sortPathkeys),
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ input_rel->cheapest_total_path,
+ unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ unique_rel,
+ path,
+ unique_rel->reltarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+
+ }
+}
+
+/*
+ * create_partial_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' partial_pathlist
+ */
+static void
+create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ RelOptInfo *partial_unique_rel;
+ Path *cheapest_partial_path;
+
+ /* nothing to do when there are no partial paths in the input rel */
+ if (!input_rel->consider_parallel || input_rel->partial_pathlist == NIL)
+ return;
+
+ /*
+ * nothing to do if there's anything in the targetlist that's
+ * parallel-restricted.
+ */
+ if (!is_parallel_safe(root, (Node *) unique_rel->reltarget->exprs))
+ return;
+
+ cheapest_partial_path = linitial(input_rel->partial_pathlist);
+
+ partial_unique_rel = makeNode(RelOptInfo);
+ memcpy(partial_unique_rel, input_rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ partial_unique_rel->pathlist = NIL;
+ partial_unique_rel->ppilist = NIL;
+ partial_unique_rel->partial_pathlist = NIL;
+ partial_unique_rel->cheapest_startup_path = NULL;
+ partial_unique_rel->cheapest_total_path = NULL;
+ partial_unique_rel->cheapest_parameterized_paths = NIL;
+
+ /* Estimate number of output rows */
+ partial_unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ cheapest_partial_path->rows,
+ NULL,
+ NULL);
+ partial_unique_rel->reltarget = unique_rel->reltarget;
+
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest partial path and incremental sort on any paths
+ * with presorted keys.
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ input_path,
+ partial_unique_rel->reltarget);
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, partial_unique_rel, path,
+ list_length(sortPathkeys),
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ cheapest_partial_path,
+ partial_unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ partial_unique_rel,
+ path,
+ partial_unique_rel->reltarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+
+ if (partial_unique_rel->partial_pathlist != NIL)
+ {
+ generate_useful_gather_paths(root, partial_unique_rel, true);
+ set_cheapest(partial_unique_rel);
+
+ /*
+ * Finally, create paths to unique-ify the final result. This step is
+ * needed to remove any duplicates due to combining rows from parallel
+ * workers.
+ */
+ create_final_unique_paths(root, partial_unique_rel,
+ sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+ }
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index eab44da65b8..28a4ae64440 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -929,11 +929,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
@@ -946,11 +946,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
}
}
@@ -970,11 +970,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
NULL);
/* and make the MergeAppend unique */
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(tlist),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(tlist),
+ dNumGroups);
add_path(result_rel, path);
}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e0192d4a491..2ee06dc7317 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,7 +46,6 @@ typedef enum
*/
#define STD_FUZZ_FACTOR 1.01
-static List *translate_sub_tlist(List *tlist, int relid);
static int append_total_cost_compare(const ListCell *a, const ListCell *b);
static int append_startup_cost_compare(const ListCell *a, const ListCell *b);
static List *reparameterize_pathlist_by_child(PlannerInfo *root,
@@ -381,7 +380,6 @@ set_cheapest(RelOptInfo *parent_rel)
parent_rel->cheapest_startup_path = cheapest_startup_path;
parent_rel->cheapest_total_path = cheapest_total_path;
- parent_rel->cheapest_unique_path = NULL; /* computed only if needed */
parent_rel->cheapest_parameterized_paths = parameterized_paths;
}
@@ -1712,246 +1710,6 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * create_unique_path
- * Creates a path representing elimination of distinct rows from the
- * input data. Distinct-ness is defined according to the needs of the
- * semijoin represented by sjinfo. If it is not possible to identify
- * how to make the data unique, NULL is returned.
- *
- * If used at all, this is likely to be called repeatedly on the same rel;
- * and the input subpath should always be the same (the cheapest_total path
- * for the rel). So we cache the result.
- */
-UniquePath *
-create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- SpecialJoinInfo *sjinfo)
-{
- UniquePath *pathnode;
- Path sort_path; /* dummy for result of cost_sort */
- Path agg_path; /* dummy for result of cost_agg */
- MemoryContext oldcontext;
- int numCols;
-
- /* Caller made a mistake if subpath isn't cheapest_total ... */
- Assert(subpath == rel->cheapest_total_path);
- Assert(subpath->parent == rel);
- /* ... or if SpecialJoinInfo is the wrong one */
- Assert(sjinfo->jointype == JOIN_SEMI);
- Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
-
- /* If result already cached, return it */
- if (rel->cheapest_unique_path)
- return (UniquePath *) rel->cheapest_unique_path;
-
- /* If it's not possible to unique-ify, return NULL */
- if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
- return NULL;
-
- /*
- * When called during GEQO join planning, we are in a short-lived memory
- * context. We must make sure that the path and any subsidiary data
- * structures created for a baserel survive the GEQO cycle, else the
- * baserel is trashed for future GEQO cycles. On the other hand, when we
- * are creating those for a joinrel during GEQO, we don't want them to
- * clutter the main planning context. Upshot is that the best solution is
- * to explicitly allocate memory in the same context the given RelOptInfo
- * is in.
- */
- oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
-
- pathnode = makeNode(UniquePath);
-
- pathnode->path.pathtype = T_Unique;
- pathnode->path.parent = rel;
- pathnode->path.pathtarget = rel->reltarget;
- pathnode->path.param_info = subpath->param_info;
- pathnode->path.parallel_aware = false;
- pathnode->path.parallel_safe = rel->consider_parallel &&
- subpath->parallel_safe;
- pathnode->path.parallel_workers = subpath->parallel_workers;
-
- /*
- * Assume the output is unsorted, since we don't necessarily have pathkeys
- * to represent it. (This might get overridden below.)
- */
- pathnode->path.pathkeys = NIL;
-
- pathnode->subpath = subpath;
-
- /*
- * Under GEQO and when planning child joins, the sjinfo might be
- * short-lived, so we'd better make copies of data structures we extract
- * from it.
- */
- pathnode->in_operators = copyObject(sjinfo->semi_operators);
- pathnode->uniq_exprs = copyObject(sjinfo->semi_rhs_exprs);
-
- /*
- * If the input is a relation and it has a unique index that proves the
- * semi_rhs_exprs are unique, then we don't need to do anything. Note
- * that relation_has_unique_index_for automatically considers restriction
- * clauses for the rel, as well.
- */
- if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
- relation_has_unique_index_for(root, rel, NIL,
- sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
-
- /*
- * If the input is a subquery whose output must be unique already, then we
- * don't need to do anything. The test for uniqueness has to consider
- * exactly which columns we are extracting; for example "SELECT DISTINCT
- * x,y" doesn't guarantee that x alone is distinct. So we cannot check for
- * this optimization unless semi_rhs_exprs consists only of simple Vars
- * referencing subquery outputs. (Possibly we could do something with
- * expressions in the subquery outputs, too, but for now keep it simple.)
- */
- if (rel->rtekind == RTE_SUBQUERY)
- {
- RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
-
- if (query_supports_distinctness(rte->subquery))
- {
- List *sub_tlist_colnos;
-
- sub_tlist_colnos = translate_sub_tlist(sjinfo->semi_rhs_exprs,
- rel->relid);
-
- if (sub_tlist_colnos &&
- query_is_distinct_for(rte->subquery,
- sub_tlist_colnos,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
- }
- }
-
- /* Estimate number of output rows */
- pathnode->path.rows = estimate_num_groups(root,
- sjinfo->semi_rhs_exprs,
- rel->rows,
- NULL,
- NULL);
- numCols = list_length(sjinfo->semi_rhs_exprs);
-
- if (sjinfo->semi_can_btree)
- {
- /*
- * Estimate cost for sort+unique implementation
- */
- cost_sort(&sort_path, root, NIL,
- subpath->disabled_nodes,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width,
- 0.0,
- work_mem,
- -1.0);
-
- /*
- * Charge one cpu_operator_cost per comparison per input tuple. We
- * assume all columns get compared at most of the tuples. (XXX
- * probably this is an overestimate.) This should agree with
- * create_upper_unique_path.
- */
- sort_path.total_cost += cpu_operator_cost * rel->rows * numCols;
- }
-
- if (sjinfo->semi_can_hash)
- {
- /*
- * Estimate the overhead per hashtable entry at 64 bytes (same as in
- * planner.c).
- */
- int hashentrysize = subpath->pathtarget->width + 64;
-
- if (hashentrysize * pathnode->path.rows > get_hash_memory_limit())
- {
- /*
- * We should not try to hash. Hack the SpecialJoinInfo to
- * remember this, in case we come through here again.
- */
- sjinfo->semi_can_hash = false;
- }
- else
- cost_agg(&agg_path, root,
- AGG_HASHED, NULL,
- numCols, pathnode->path.rows,
- NIL,
- subpath->disabled_nodes,
- subpath->startup_cost,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width);
- }
-
- if (sjinfo->semi_can_btree && sjinfo->semi_can_hash)
- {
- if (agg_path.disabled_nodes < sort_path.disabled_nodes ||
- (agg_path.disabled_nodes == sort_path.disabled_nodes &&
- agg_path.total_cost < sort_path.total_cost))
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- pathnode->umethod = UNIQUE_PATH_SORT;
- }
- else if (sjinfo->semi_can_btree)
- pathnode->umethod = UNIQUE_PATH_SORT;
- else if (sjinfo->semi_can_hash)
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- {
- /* we can get here only if we abandoned hashing above */
- MemoryContextSwitchTo(oldcontext);
- return NULL;
- }
-
- if (pathnode->umethod == UNIQUE_PATH_HASH)
- {
- pathnode->path.disabled_nodes = agg_path.disabled_nodes;
- pathnode->path.startup_cost = agg_path.startup_cost;
- pathnode->path.total_cost = agg_path.total_cost;
- }
- else
- {
- pathnode->path.disabled_nodes = sort_path.disabled_nodes;
- pathnode->path.startup_cost = sort_path.startup_cost;
- pathnode->path.total_cost = sort_path.total_cost;
- }
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
-}
-
/*
* create_gather_merge_path
*
@@ -2003,36 +1761,6 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * translate_sub_tlist - get subquery column numbers represented by tlist
- *
- * The given targetlist usually contains only Vars referencing the given relid.
- * Extract their varattnos (ie, the column numbers of the subquery) and return
- * as an integer List.
- *
- * If any of the tlist items is not a simple Var, we cannot determine whether
- * the subquery's uniqueness condition (if any) matches ours, so punt and
- * return NIL.
- */
-static List *
-translate_sub_tlist(List *tlist, int relid)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, tlist)
- {
- Var *var = (Var *) lfirst(l);
-
- if (!var || !IsA(var, Var) ||
- var->varno != relid)
- return NIL; /* punt */
-
- result = lappend_int(result, var->varattno);
- }
- return result;
-}
-
/*
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
@@ -2790,8 +2518,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3046,8 +2773,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3094,8 +2820,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3171,13 +2896,10 @@ create_group_path(PlannerInfo *root,
}
/*
- * create_upper_unique_path
+ * create_unique_path
* Creates a pathnode that represents performing an explicit Unique step
* on presorted input.
*
- * This produces a Unique plan node, but the use-case is so different from
- * create_unique_path that it doesn't seem worth trying to merge the two.
- *
* 'rel' is the parent relation associated with the result
* 'subpath' is the path representing the source of data
* 'numCols' is the number of grouping columns
@@ -3186,21 +2908,20 @@ create_group_path(PlannerInfo *root,
* The input path must be sorted on the grouping columns, plus possibly
* additional columns; so the first numCols pathkeys are the grouping columns
*/
-UpperUniquePath *
-create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups)
+UniquePath *
+create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups)
{
- UpperUniquePath *pathnode = makeNode(UpperUniquePath);
+ UniquePath *pathnode = makeNode(UniquePath);
pathnode->path.pathtype = T_Unique;
pathnode->path.parent = rel;
/* Unique doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3256,8 +2977,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..0e523d2eb5b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -217,7 +217,6 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->partial_pathlist = NIL;
rel->cheapest_startup_path = NULL;
rel->cheapest_total_path = NULL;
- rel->cheapest_unique_path = NULL;
rel->cheapest_parameterized_paths = NIL;
rel->relid = relid;
rel->rtekind = rte->rtekind;
@@ -269,6 +268,9 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->fdw_private = NULL;
rel->unique_for_rels = NIL;
rel->non_unique_for_rels = NIL;
+ rel->unique_rel = NULL;
+ rel->unique_pathkeys = NIL;
+ rel->unique_groupclause = NIL;
rel->baserestrictinfo = NIL;
rel->baserestrictcost.startup = 0;
rel->baserestrictcost.per_tuple = 0;
@@ -713,7 +715,6 @@ build_join_rel(PlannerInfo *root,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
/* init direct_lateral_relids from children; we'll finish it up below */
joinrel->direct_lateral_relids =
@@ -748,6 +749,9 @@ build_join_rel(PlannerInfo *root,
joinrel->fdw_private = NULL;
joinrel->unique_for_rels = NIL;
joinrel->non_unique_for_rels = NIL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -906,7 +910,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
joinrel->direct_lateral_relids = NULL;
joinrel->lateral_relids = NULL;
@@ -933,6 +936,9 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->useridiscurrent = false;
joinrel->fdwroutine = NULL;
joinrel->fdw_private = NULL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -1488,7 +1494,6 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->pathlist = NIL;
upperrel->cheapest_startup_path = NULL;
upperrel->cheapest_total_path = NULL;
- upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fbe333d88fa..e97566b5938 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -319,8 +319,8 @@ typedef enum JoinType
* These codes are used internally in the planner, but are not supported
* by the executor (nor, indeed, by most of the planner).
*/
- JOIN_UNIQUE_OUTER, /* LHS path must be made unique */
- JOIN_UNIQUE_INNER, /* RHS path must be made unique */
+ JOIN_UNIQUE_OUTER, /* LHS has be made unique */
+ JOIN_UNIQUE_INNER, /* RHS has be made unique */
/*
* We might need additional join types someday.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595d..45f0b9c8ee9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -700,8 +700,6 @@ typedef struct PartitionSchemeData *PartitionScheme;
* (regardless of ordering) among the unparameterized paths;
* or if there is no unparameterized path, the path with lowest
* total cost among the paths with minimum parameterization
- * cheapest_unique_path - for caching cheapest path to produce unique
- * (no duplicates) output from relation; NULL if not yet requested
* cheapest_parameterized_paths - best paths for their parameterizations;
* always includes cheapest_total_path, even if that's unparameterized
* direct_lateral_relids - rels this rel has direct LATERAL references to
@@ -764,6 +762,21 @@ typedef struct PartitionSchemeData *PartitionScheme;
* other rels for which we have tried and failed to prove
* this one unique
*
+ * Three fields are used to cache information about unique-ification of this
+ * relation. This is used to support semijoins where the relation appears on
+ * the RHS: the relation is first unique-ified, and then a regular join is
+ * performed:
+ *
+ * unique_rel - the unique-ified version of the relation, containing paths
+ * that produce unique (no duplicates) output from relation;
+ * NULL if not yet requested
+ * unique_pathkeys - pathkeys that represent the ordering requirements for
+ * the relation's output in sort-based unique-ification
+ * implementations
+ * unique_groupclause - a list of SortGroupClause nodes that represent the
+ * columns to be grouped on in hash-based unique-ification
+ * implementations
+ *
* The presence of the following fields depends on the restrictions
* and joins that the relation participates in:
*
@@ -924,7 +937,6 @@ typedef struct RelOptInfo
List *partial_pathlist; /* partial Paths */
struct Path *cheapest_startup_path;
struct Path *cheapest_total_path;
- struct Path *cheapest_unique_path;
List *cheapest_parameterized_paths;
/*
@@ -1002,6 +1014,16 @@ typedef struct RelOptInfo
/* known not unique for these set(s) */
List *non_unique_for_rels;
+ /*
+ * information about unique-ification of this relation
+ */
+ /* the unique-ified version of the relation */
+ struct RelOptInfo *unique_rel;
+ /* pathkeys for sort-based unique-ification implementations */
+ List *unique_pathkeys;
+ /* SortGroupClause nodes for hash-based unique-ification implementations */
+ List *unique_groupclause;
+
/*
* used by various scans and joins:
*/
@@ -1739,8 +1761,8 @@ typedef struct ParamPathInfo
* and the specified outer rel(s).
*
* "rows" is the same as parent->rows in simple paths, but in parameterized
- * paths and UniquePaths it can be less than parent->rows, reflecting the
- * fact that we've filtered by extra join conditions or removed duplicates.
+ * paths it can be less than parent->rows, reflecting the fact that we've
+ * filtered by extra join conditions.
*
* "pathkeys" is a List of PathKey nodes (see above), describing the sort
* ordering of the path's output rows.
@@ -2137,34 +2159,6 @@ typedef struct MemoizePath
* if unknown */
} MemoizePath;
-/*
- * UniquePath represents elimination of distinct rows from the output of
- * its subpath.
- *
- * This can represent significantly different plans: either hash-based or
- * sort-based implementation, or a no-op if the input path can be proven
- * distinct already. The decision is sufficiently localized that it's not
- * worth having separate Path node types. (Note: in the no-op case, we could
- * eliminate the UniquePath node entirely and just return the subpath; but
- * it's convenient to have a UniquePath in the path tree to signal upper-level
- * routines that the input is known distinct.)
- */
-typedef enum UniquePathMethod
-{
- UNIQUE_PATH_NOOP, /* input is known unique already */
- UNIQUE_PATH_HASH, /* use hashing */
- UNIQUE_PATH_SORT, /* use sorting */
-} UniquePathMethod;
-
-typedef struct UniquePath
-{
- Path path;
- Path *subpath;
- UniquePathMethod umethod;
- List *in_operators; /* equality operators of the IN clause */
- List *uniq_exprs; /* expressions to be made unique */
-} UniquePath;
-
/*
* GatherPath runs several copies of a plan in parallel and collects the
* results. The parallel leader may also execute the plan, unless the
@@ -2371,17 +2365,17 @@ typedef struct GroupPath
} GroupPath;
/*
- * UpperUniquePath represents adjacent-duplicate removal (in presorted input)
+ * UniquePath represents adjacent-duplicate removal (in presorted input)
*
* The columns to be compared are the first numkeys columns of the path's
* pathkeys. The input is presumed already sorted that way.
*/
-typedef struct UpperUniquePath
+typedef struct UniquePath
{
Path path;
Path *subpath; /* path representing input source */
int numkeys; /* number of pathkey columns to compare */
-} UpperUniquePath;
+} UniquePath;
/*
* AggPath represents generic computation of aggregate functions
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 60dcdb77e41..71d2945b175 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -91,8 +91,6 @@ extern MemoizePath *create_memoize_path(PlannerInfo *root,
bool singlerow,
bool binary_mode,
double calls);
-extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
- Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath, PathTarget *target,
Relids required_outer, double *rows);
@@ -223,11 +221,11 @@ extern GroupPath *create_group_path(PlannerInfo *root,
List *groupClause,
List *qual,
double numGroups);
-extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups);
+extern UniquePath *create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups);
extern AggPath *create_agg_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 347c582a789..f220e9a270d 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,4 +59,7 @@ extern Path *get_cheapest_fractional_path(RelOptInfo *rel,
extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
+extern RelOptInfo *create_unique_paths(PlannerInfo *root, RelOptInfo *rel,
+ SpecialJoinInfo *sjinfo);
+
#endif /* PLANNER_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index f35a0b18c37..bb1807b4521 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -9226,23 +9226,20 @@ where exists (select 1 from tenk1 t3
---------------------------------------------------------------------------------
Nested Loop
Output: t1.unique1, t2.hundred
- -> Hash Join
+ -> Nested Loop
Output: t1.unique1, t3.tenthous
- Hash Cond: (t3.thousand = t1.unique1)
- -> HashAggregate
+ -> Index Only Scan using onek_unique1 on public.onek t1
+ Output: t1.unique1
+ Index Cond: (t1.unique1 < 1)
+ -> Unique
Output: t3.thousand, t3.tenthous
- Group Key: t3.thousand, t3.tenthous
-> Index Only Scan using tenk1_thous_tenthous on public.tenk1 t3
Output: t3.thousand, t3.tenthous
- -> Hash
- Output: t1.unique1
- -> Index Only Scan using onek_unique1 on public.onek t1
- Output: t1.unique1
- Index Cond: (t1.unique1 < 1)
+ Index Cond: (t3.thousand = t1.unique1)
-> Index Only Scan using tenk1_hundred on public.tenk1 t2
Output: t2.hundred
Index Cond: (t2.hundred = t3.tenthous)
-(18 rows)
+(15 rows)
-- ... unless it actually is unique
create table j3 as select unique1, tenthous from onek;
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index d5368186caa..24e06845f92 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1134,48 +1134,50 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- Join Filter: (t1_2.a = t1_5.b)
- -> HashAggregate
- Group Key: t1_5.b
+ -> Nested Loop
+ Join Filter: (t1_2.a = t1_5.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_5.b
-> Hash Join
Hash Cond: (((t2_1.a + t2_1.b) / 2) = t1_5.b)
-> Seq Scan on prt1_e_p1 t2_1
-> Hash
-> Seq Scan on prt2_p1 t1_5
Filter: (a = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
- Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_3.a = t1_6.b)
- -> HashAggregate
- Group Key: t1_6.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
+ Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_3.a = t1_6.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Join
Hash Cond: (((t2_2.a + t2_2.b) / 2) = t1_6.b)
-> Seq Scan on prt1_e_p2 t2_2
-> Hash
-> Seq Scan on prt2_p2 t1_6
Filter: (a = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
- Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_4.a = t1_7.b)
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
+ Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_4.a = t1_7.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Nested Loop
-> Seq Scan on prt2_p3 t1_7
Filter: (a = 0)
-> Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t2_3
Index Cond: (((a + b) / 2) = t1_7.b)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
- Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
- Filter: (b = 0)
-(41 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
+ Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
+ Filter: (b = 0)
+(43 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
a | b | c
@@ -1190,46 +1192,48 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_6.b
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Semi Join
Hash Cond: (t1_6.b = ((t1_9.a + t1_9.b) / 2))
-> Seq Scan on prt2_p1 t1_6
-> Hash
-> Seq Scan on prt1_e_p1 t1_9
Filter: (c = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
- Index Cond: (a = t1_6.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
+ Index Cond: (a = t1_6.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Hash Semi Join
Hash Cond: (t1_7.b = ((t1_10.a + t1_10.b) / 2))
-> Seq Scan on prt2_p2 t1_7
-> Hash
-> Seq Scan on prt1_e_p2 t1_10
Filter: (c = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
- Index Cond: (a = t1_7.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_8.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
+ Index Cond: (a = t1_7.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_8.b
-> Hash Semi Join
Hash Cond: (t1_8.b = ((t1_11.a + t1_11.b) / 2))
-> Seq Scan on prt2_p3 t1_8
-> Hash
-> Seq Scan on prt1_e_p3 t1_11
Filter: (c = 0)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
- Index Cond: (a = t1_8.b)
- Filter: (b = 0)
-(39 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
+ Index Cond: (a = t1_8.b)
+ Filter: (b = 0)
+(41 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 40d8056fcea..66732f9b866 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -707,6 +707,212 @@ select * from numeric_table
3
(4 rows)
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Index Cond: (t2.a = t3.b)
+(18 rows)
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Incremental Sort
+ Output: t1.a, t1.b, t2.a, t2.b
+ Sort Key: t1.a, t2.a
+ Presorted Key: t1.a
+ -> Merge Join
+ Output: t1.a, t1.b, t2.a, t2.b
+ Merge Cond: (t1.a = ((t3.a + 1)))
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Sort
+ Output: t2.a, t2.b, t3.a, ((t3.a + 1))
+ Sort Key: ((t3.a + 1))
+ -> Hash Join
+ Output: t2.a, t2.b, t3.a, ((t3.a + 1))
+ Hash Cond: (t2.a = ((t3.b + 1)))
+ -> Seq Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ -> Hash
+ Output: t3.a, t3.b, ((t3.a + 1)), ((t3.b + 1))
+ -> HashAggregate
+ Output: t3.a, t3.b, ((t3.a + 1)), ((t3.b + 1))
+ Group Key: (t3.a + 1), (t3.b + 1)
+ -> Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b, (t3.a + 1), (t3.b + 1)
+(24 rows)
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+set enable_indexscan to off;
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Gather Merge
+ Output: t3.a, t3.b
+ Workers Planned: 2
+ -> Sort
+ Output: t3.a, t3.b
+ Sort Key: t3.a, t3.b
+ -> HashAggregate
+ Output: t3.a, t3.b
+ Group Key: t3.a, t3.b
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Materialize
+ Output: t1.a, t1.b
+ -> Gather Merge
+ Output: t1.a, t1.b
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, t1.b
+ Sort Key: t1.a
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Bitmap Heap Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Recheck Cond: (t2.a = t3.b)
+ -> Bitmap Index Scan on semijoin_unique_tbl_a_b_idx
+ Index Cond: (t2.a = t3.b)
+(37 rows)
+
+reset enable_indexscan;
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+drop table semijoin_unique_tbl;
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+set enable_partitionwise_join to on;
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: t1.a
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t2_1.a, t2_1.b
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t3_1.a
+ -> Unique
+ Output: t3_1.a
+ -> Index Only Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t3_1
+ Output: t3_1.a
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t1_1
+ Output: t1_1.a, t1_1.b
+ Index Cond: (t1_1.a = t3_1.a)
+ -> Memoize
+ Output: t2_1.a, t2_1.b
+ Cache Key: t1_1.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t2_1
+ Output: t2_1.a, t2_1.b
+ Index Cond: (t2_1.a = t1_1.a)
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t2_2.a, t2_2.b
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t3_2.a
+ -> Unique
+ Output: t3_2.a
+ -> Index Only Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t3_2
+ Output: t3_2.a
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t1_2
+ Output: t1_2.a, t1_2.b
+ Index Cond: (t1_2.a = t3_2.a)
+ -> Memoize
+ Output: t2_2.a, t2_2.b
+ Cache Key: t1_2.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t2_2
+ Output: t2_2.a, t2_2.b
+ Index Cond: (t2_2.a = t1_2.a)
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t2_3.a, t2_3.b
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t3_3.a
+ -> Unique
+ Output: t3_3.a
+ -> Sort
+ Output: t3_3.a
+ Sort Key: t3_3.a
+ -> Seq Scan on public.unique_tbl_p3 t3_3
+ Output: t3_3.a
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t1_3
+ Output: t1_3.a, t1_3.b
+ Index Cond: (t1_3.a = t3_3.a)
+ -> Memoize
+ Output: t2_3.a, t2_3.b
+ Cache Key: t1_3.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t2_3
+ Output: t2_3.a, t2_3.b
+ Index Cond: (t2_3.a = t1_3.a)
+(59 rows)
+
+reset enable_partitionwise_join;
+drop table unique_tbl_p;
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
@@ -2672,18 +2878,17 @@ EXPLAIN (COSTS OFF)
SELECT * FROM onek
WHERE (unique1,ten) IN (VALUES (1,1), (20,0), (99,9), (17,99))
ORDER BY unique1;
- QUERY PLAN
------------------------------------------------------------------
- Sort
- Sort Key: onek.unique1
- -> Nested Loop
- -> HashAggregate
- Group Key: "*VALUES*".column1, "*VALUES*".column2
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: "*VALUES*".column1, "*VALUES*".column2
-> Values Scan on "*VALUES*"
- -> Index Scan using onek_unique1 on onek
- Index Cond: (unique1 = "*VALUES*".column1)
- Filter: ("*VALUES*".column2 = ten)
-(9 rows)
+ -> Index Scan using onek_unique1 on onek
+ Index Cond: (unique1 = "*VALUES*".column1)
+ Filter: ("*VALUES*".column2 = ten)
+(8 rows)
EXPLAIN (COSTS OFF)
SELECT * FROM onek
@@ -2858,12 +3063,10 @@ SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) ORDER BY 1);
-> Unique
-> Sort
Sort Key: "*VALUES*".column1
- -> Sort
- Sort Key: "*VALUES*".column1
- -> Values Scan on "*VALUES*"
+ -> Values Scan on "*VALUES*"
-> Index Scan using onek_unique1 on onek
Index Cond: (unique1 = "*VALUES*".column1)
-(9 rows)
+(7 rows)
EXPLAIN (COSTS OFF)
SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) LIMIT 1);
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index fec38ef85a6..a93fd222441 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -361,6 +361,73 @@ select * from float_table
select * from numeric_table
where num_col in (select float_col from float_table);
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+
+set enable_indexscan to off;
+
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+reset enable_indexscan;
+
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+
+drop table semijoin_unique_tbl;
+
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+
+set enable_partitionwise_join to on;
+
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+
+reset enable_partitionwise_join;
+
+drop table unique_tbl_p;
+
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a8346cda633..6b715d456c6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3140,7 +3140,6 @@ UnicodeNormalizationForm
UnicodeNormalizationQC
Unique
UniquePath
-UniquePathMethod
UniqueState
UnlistenStmt
UnresolvedTup
@@ -3155,7 +3154,6 @@ UpgradeTaskSlotState
UpgradeTaskStep
UploadManifestCmd
UpperRelationKind
-UpperUniquePath
UserAuth
UserContext
UserMapping
--
2.43.0
On Tue, Jun 3, 2025 at 4:52 PM Richard Guo <guofenglinux@gmail.com> wrote:
Here is an updated version of the patch, which is ready for review.
I've fixed a cost estimation issue, improved some comments, and added
a commit message. Nothing essential has changed.
This patch does not apply anymore, and here is a new rebase.
It may be argued that this patch introduces additional planning
overhead by considering multiple unique-ification paths for the RHS.
While that is true to some extent, I don't think this is a problem.
Please bear with me a moment.
* The additional path generation only occurs in specific semijoin
cases where one input rel is exactly the RHS. Queries without such
semijoins are not affected.
* This patch only considers the cheapest total path and presorted
paths from the original RHS. These are typically few in number, and
each has a high likelihood of contributing to a lower overall cost for
the final plan. I think the cost-benefit trade-off is worthwhile.
* This patch follows the convention in joinpath.c of exploring
alternative join input paths, rather than introducing novel overhead.
For example, when planning (A SEMIJOIN B), the planner considers
multiple paths from B, including its cheapest total path and any paths
with useful sort orders. There is no clear reason why, in the
analogous case of (A INNERJOIN unique-ified(B)), we should restrict
ourselves to only one path from the unique-ified RHS.
* On the other hand, if we insist on considering only a single path
from the unique-ified RHS, we face a dilemma when the hash-based
implementation has a cheaper total cost, but the sort-based
implementation has a better sort order. In such cases, what should
our selection criteria be? Currently, create_unique_path() simply
compares total_cost to choose the cheaper one (even without applying a
fuzz factor), and ignores sort order entirely. I don't think this
approach makes sense. It is also inconsistent with the general
pathification framework, where we rely on add_path() to retain the
best path or set of paths based on cost and other metrics, rather than
using such simple heuristics.
Another point I'd like to mention is that this patch removes the
UNIQUE_PATH_NOOP related code along the way, because I think it's dead
code. If the RHS rel is provably unique, the semijoin should have
already been simplified to a plain inner join by analyzejoins.c.
However, I might be overlooking something, and I'd appreciate any
feedback or corrections.
Thanks
Richard
Attachments:
v3-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchapplication/octet-stream; name=v3-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchDownload
From dabdb05c2970a8c49502ccc66c07b67e88aaada0 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 21 May 2025 12:32:29 +0900
Subject: [PATCH v3] Pathify RHS unique-ification for semijoin planning
There are two implementation techniques for semijoins: one uses the
JOIN_SEMI jointype, where the executor emits at most one matching row
per left-hand side (LHS) row; the other unique-ifies the right-hand
side (RHS) and then performs a plain inner join.
The latter technique currently has some drawbacks related to the
unique-ification step.
* Only the cheapest-total path of the RHS is considered during
unique-ification. This may cause us to miss some optimization
opportunities; for example, a path with a better sort order might be
overlooked simply because it is not the cheapest in total cost. Such
a path could help avoid a sort at a higher level, potentially
resulting in a cheaper overall plan.
* We currently rely on heuristics to choose between hash-based and
sort-based unique-ification. A better approach would be to generate
paths for both methods and allow add_path() to decide which one is
preferable, consistent with how path selection is handled elsewhere in
the planner.
* In the sort-based implementation, we currently pay no attention to
the pathkeys of the input subpath or the resulting output. This can
result in redundant sort nodes being added to the final plan.
This patch improves semijoin planning by creating a new RelOptInfo for
the RHS rel to represent its unique-ified version. It then generates
multiple paths that represent elimination of distinct rows from the
RHS, considering both a hash-based implementation using the cheapest
total path of the original RHS rel, and sort-based implementations
that either exploit presorted input paths or explicitly sort the
cheapest total path. All resulting paths compete in add_path(), and
those deemed worthy of consideration are added to the new RelOptInfo.
Finally, the unique-ified rel is joined with the other side of the
semijoin using a plain inner join.
As a side effect, most of the code related to the JOIN_UNIQUE_OUTER
and JOIN_UNIQUE_INNER jointypes -- used to indicate that the LHS or
RHS path should be made unique -- has been removed. Besides, the
T_Unique path now has the same meaning for both semijoins and upper
DISTINCT clauses: it represents adjacent-duplicate removal on
presorted input. This patch unifies their handling by sharing the
same data structures and functions.
---
src/backend/optimizer/README | 3 +-
src/backend/optimizer/path/costsize.c | 6 +-
src/backend/optimizer/path/joinpath.c | 335 ++++--------
src/backend/optimizer/path/joinrels.c | 18 +-
src/backend/optimizer/plan/createplan.c | 292 +----------
src/backend/optimizer/plan/planner.c | 509 ++++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 30 +-
src/backend/optimizer/util/pathnode.c | 306 +----------
src/backend/optimizer/util/relnode.c | 13 +-
src/include/nodes/nodes.h | 4 +-
src/include/nodes/pathnodes.h | 66 ++-
src/include/optimizer/pathnode.h | 12 +-
src/include/optimizer/planner.h | 3 +
src/test/regress/expected/join.out | 17 +-
src/test/regress/expected/partition_join.out | 94 ++--
src/test/regress/expected/subselect.out | 233 ++++++++-
src/test/regress/sql/subselect.sql | 67 +++
src/tools/pgindent/typedefs.list | 2 -
18 files changed, 1037 insertions(+), 973 deletions(-)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..843368096fd 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -640,7 +640,6 @@ RelOptInfo - a relation or joined relations
GroupResultPath - childless Result plan node (used for degenerate grouping)
MaterialPath - a Material plan node
MemoizePath - a Memoize plan node for caching tuples from sub-paths
- UniquePath - remove duplicate rows (either by hashing or sorting)
GatherPath - collect the results of parallel workers
GatherMergePath - collect parallel results, preserving their common sort order
ProjectionPath - a Result plan node with child (used for projection)
@@ -648,7 +647,7 @@ RelOptInfo - a relation or joined relations
SortPath - a Sort plan node applied to some sub-path
IncrementalSortPath - an IncrementalSort plan node applied to some sub-path
GroupPath - a Group plan node applied to some sub-path
- UpperUniquePath - a Unique plan node applied to some sub-path
+ UniquePath - a Unique plan node applied to some sub-path
AggPath - an Agg plan node applied to some sub-path
GroupingSetsPath - an Agg plan node used to implement GROUPING SETS
MinMaxAggPath - a Result plan node with subplans performing MIN/MAX
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 3d44815ed5a..2da6880c152 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3937,7 +3937,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
* The whole issue is moot if we are working from a unique-ified outer
* input, or if we know we don't need to mark/restore at all.
*/
- if (IsA(outer_path, UniquePath) || path->skip_mark_restore)
+ if (IsA(outer_path, UniquePath) ||
+ IsA(outer_path, AggPath) ||
+ path->skip_mark_restore)
rescannedtuples = 0;
else
{
@@ -4332,7 +4334,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
* because we avoid contaminating the cache with a value that's wrong for
* non-unique-ified paths.
*/
- if (IsA(inner_path, UniquePath))
+ if (IsA(inner_path, UniquePath) || IsA(inner_path, AggPath))
{
innerbucketsize = 1.0 / virtualbuckets;
innermcvfreq = 0.0;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 7aa8f5d799c..72a70cefdc6 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -112,13 +112,13 @@ static void generate_mergejoin_paths(PlannerInfo *root,
* "flipped around" if we are considering joining the rels in the opposite
* direction from what's indicated in sjinfo.
*
- * Also, this routine and others in this module accept the special JoinTypes
- * JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that we should
- * unique-ify the outer or inner relation and then apply a regular inner
- * join. These values are not allowed to propagate outside this module,
- * however. Path cost estimation code may need to recognize that it's
- * dealing with such a case --- the combination of nominal jointype INNER
- * with sjinfo->jointype == JOIN_SEMI indicates that.
+ * Also, this routine accepts the special JoinTypes JOIN_UNIQUE_OUTER and
+ * JOIN_UNIQUE_INNER to indicate that the outer or inner relation has been
+ * unique-ified and a regular inner join should then be applied. These values
+ * are not allowed to propagate outside this routine, however. Path cost
+ * estimation code may need to recognize that it's dealing with such a case ---
+ * the combination of nominal jointype INNER with sjinfo->jointype == JOIN_SEMI
+ * indicates that.
*/
void
add_paths_to_joinrel(PlannerInfo *root,
@@ -129,6 +129,7 @@ add_paths_to_joinrel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
List *restrictlist)
{
+ JoinType save_jointype = jointype;
JoinPathExtraData extra;
bool mergejoin_allowed = true;
ListCell *lc;
@@ -161,10 +162,10 @@ add_paths_to_joinrel(PlannerInfo *root,
* (else reduce_unique_semijoins would've simplified it), so there's no
* point in calling innerrel_is_unique. However, if the LHS covers all of
* the semijoin's min_lefthand, then it's appropriate to set inner_unique
- * because the path produced by create_unique_path will be unique relative
- * to the LHS. (If we have an LHS that's only part of the min_lefthand,
- * that is *not* true.) For JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid
- * letting that value escape this module.
+ * because the unique relation produced by create_unique_paths will be
+ * unique relative to the LHS. (If we have an LHS that's only part of the
+ * min_lefthand, that is *not* true.) For JOIN_UNIQUE_OUTER, pass
+ * JOIN_INNER to avoid letting that value escape this module.
*/
switch (jointype)
{
@@ -201,6 +202,13 @@ add_paths_to_joinrel(PlannerInfo *root,
break;
}
+ /*
+ * If the outer or inner relation has been unique-ified, handle as a plain
+ * inner join.
+ */
+ if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
+ jointype = JOIN_INNER;
+
/*
* Find potential mergejoin clauses. We can skip this if we are not
* interested in doing a mergejoin. However, mergejoin may be our only
@@ -331,7 +339,7 @@ add_paths_to_joinrel(PlannerInfo *root,
joinrel->fdwroutine->GetForeignJoinPaths)
joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
/*
* 6. Finally, give extensions a chance to manipulate the path list. They
@@ -341,7 +349,7 @@ add_paths_to_joinrel(PlannerInfo *root,
*/
if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
}
/*
@@ -1361,7 +1369,6 @@ sort_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *outer_path;
Path *inner_path;
Path *cheapest_partial_outer = NULL;
@@ -1399,38 +1406,16 @@ sort_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(inner_path, outerrel))
return;
- /*
- * If unique-ification is requested, do it and then handle as a plain
- * inner join.
- */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- outer_path = (Path *) create_unique_path(root, outerrel,
- outer_path, extra->sjinfo);
- Assert(outer_path);
- jointype = JOIN_INNER;
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- inner_path = (Path *) create_unique_path(root, innerrel,
- inner_path, extra->sjinfo);
- Assert(inner_path);
- jointype = JOIN_INNER;
- }
-
/*
* If the joinrel is parallel-safe, we may be able to consider a partial
- * merge join. However, we can't handle JOIN_UNIQUE_OUTER, because the
- * outer path will be partial, and therefore we won't be able to properly
- * guarantee uniqueness. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT
- * and JOIN_RIGHT_ANTI, because they can produce false null extended rows.
+ * merge join. However, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * JOIN_RIGHT_ANTI, because they can produce false null extended rows.
* Also, the resulting path must not be parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -1438,7 +1423,7 @@ sort_inner_and_outer(PlannerInfo *root,
if (inner_path->parallel_safe)
cheapest_safe_inner = inner_path;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
@@ -1577,13 +1562,9 @@ generate_mergejoin_paths(PlannerInfo *root,
List *trialsortkeys;
Path *cheapest_startup_inner;
Path *cheapest_total_inner;
- JoinType save_jointype = jointype;
int num_sortkeys;
int sortkeycnt;
- if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/* Look for useful mergeclauses (if any) */
mergeclauses =
find_mergeclauses_for_outer_pathkeys(root,
@@ -1633,10 +1614,6 @@ generate_mergejoin_paths(PlannerInfo *root,
extra,
is_partial);
- /* Can't do anything else if inner path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
/*
* Look for presorted inner paths that satisfy the innersortkey list ---
* or any truncation thereof, if we are allowed to build a mergejoin using
@@ -1816,7 +1793,6 @@ match_unsorted_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool nestjoinOK;
bool useallclauses;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
@@ -1852,12 +1828,6 @@ match_unsorted_outer(PlannerInfo *root,
nestjoinOK = false;
useallclauses = true;
break;
- case JOIN_UNIQUE_OUTER:
- case JOIN_UNIQUE_INNER:
- jointype = JOIN_INNER;
- nestjoinOK = true;
- useallclauses = false;
- break;
default:
elog(ERROR, "unrecognized join type: %d",
(int) jointype);
@@ -1874,20 +1844,7 @@ match_unsorted_outer(PlannerInfo *root,
if (PATH_PARAM_BY_REL(inner_cheapest_total, outerrel))
inner_cheapest_total = NULL;
- /*
- * If we need to unique-ify the inner path, we will consider only the
- * cheapest-total inner.
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /* No way to do this with an inner path parameterized by outer rel */
- if (inner_cheapest_total == NULL)
- return;
- inner_cheapest_total = (Path *)
- create_unique_path(root, innerrel, inner_cheapest_total, extra->sjinfo);
- Assert(inner_cheapest_total);
- }
- else if (nestjoinOK)
+ if (nestjoinOK)
{
/*
* Consider materializing the cheapest inner path, unless
@@ -1911,20 +1868,6 @@ match_unsorted_outer(PlannerInfo *root,
if (PATH_PARAM_BY_REL(outerpath, innerrel))
continue;
- /*
- * If we need to unique-ify the outer path, it's pointless to consider
- * any but the cheapest outer. (XXX we don't consider parameterized
- * outers, nor inners, for unique-ified cases. Should we?)
- */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- {
- if (outerpath != outerrel->cheapest_total_path)
- continue;
- outerpath = (Path *) create_unique_path(root, outerrel,
- outerpath, extra->sjinfo);
- Assert(outerpath);
- }
-
/*
* The result will have this sort order (even if it is implemented as
* a nestloop, and even if some of the mergeclauses are implemented by
@@ -1933,21 +1876,7 @@ match_unsorted_outer(PlannerInfo *root,
merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
outerpath->pathkeys);
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /*
- * Consider nestloop join, but only with the unique-ified cheapest
- * inner path
- */
- try_nestloop_path(root,
- joinrel,
- outerpath,
- inner_cheapest_total,
- merge_pathkeys,
- jointype,
- extra);
- }
- else if (nestjoinOK)
+ if (nestjoinOK)
{
/*
* Consider nestloop joins using this outer path and various
@@ -1998,17 +1927,13 @@ match_unsorted_outer(PlannerInfo *root,
extra);
}
- /* Can't do anything else if outer path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- continue;
-
/* Can't do anything else if inner rel is parameterized by outer */
if (inner_cheapest_total == NULL)
continue;
/* Generate merge join paths */
generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
- save_jointype, extra, useallclauses,
+ jointype, extra, useallclauses,
inner_cheapest_total, merge_pathkeys,
false);
}
@@ -2016,41 +1941,35 @@ match_unsorted_outer(PlannerInfo *root,
/*
* Consider partial nestloop and mergejoin plan if outerrel has any
* partial path and the joinrel is parallel-safe. However, we can't
- * handle JOIN_UNIQUE_OUTER, because the outer path will be partial, and
- * therefore we won't be able to properly guarantee uniqueness. Nor can
- * we handle joins needing lateral rels, since partial paths must not be
- * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * handle joins needing lateral rels, since partial paths must not be
+ * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
* JOIN_RIGHT_ANTI, because they can produce false null extended rows.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
if (nestjoinOK)
consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
- save_jointype, extra);
+ jointype, extra);
/*
* If inner_cheapest_total is NULL or non parallel-safe then find the
- * cheapest total parallel safe path. If doing JOIN_UNIQUE_INNER, we
- * can't use any alternative inner path.
+ * cheapest total parallel safe path.
*/
if (inner_cheapest_total == NULL ||
!inner_cheapest_total->parallel_safe)
{
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
- inner_cheapest_total = get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
+ inner_cheapest_total =
+ get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
if (inner_cheapest_total)
consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
- save_jointype, extra,
+ jointype, extra,
inner_cheapest_total);
}
}
@@ -2115,24 +2034,17 @@ consider_parallel_nestloop(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
Path *matpath = NULL;
ListCell *lc1;
- if (jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/*
- * Consider materializing the cheapest inner path, unless: 1) we're doing
- * JOIN_UNIQUE_INNER, because in this case we have to unique-ify the
- * cheapest inner path, 2) enable_material is off, 3) the cheapest inner
- * path is not parallel-safe, 4) the cheapest inner path is parameterized
- * by the outer rel, or 5) the cheapest inner path materializes its output
- * anyway.
+ * Consider materializing the cheapest inner path, unless: 1)
+ * enable_material is off, 2) the cheapest inner path is not
+ * parallel-safe, 3) the cheapest inner path is parameterized by the outer
+ * rel, or 4) the cheapest inner path materializes its output anyway.
*/
- if (save_jointype != JOIN_UNIQUE_INNER &&
- enable_material && inner_cheapest_total->parallel_safe &&
+ if (enable_material && inner_cheapest_total->parallel_safe &&
!PATH_PARAM_BY_REL(inner_cheapest_total, outerrel) &&
!ExecMaterializesOutput(inner_cheapest_total->pathtype))
{
@@ -2166,23 +2078,6 @@ consider_parallel_nestloop(PlannerInfo *root,
if (!innerpath->parallel_safe)
continue;
- /*
- * If we're doing JOIN_UNIQUE_INNER, we can only use the inner's
- * cheapest_total_path, and we have to unique-ify it. (We might
- * be able to relax this to allow other safe, unparameterized
- * inner paths, but right now create_unique_path is not on board
- * with that.)
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- if (innerpath != innerrel->cheapest_total_path)
- continue;
- innerpath = (Path *) create_unique_path(root, innerrel,
- innerpath,
- extra->sjinfo);
- Assert(innerpath);
- }
-
try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
pathkeys, jointype, extra);
@@ -2224,7 +2119,6 @@ hash_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool isouterjoin = IS_OUTER_JOIN(jointype);
List *hashclauses;
ListCell *l;
@@ -2287,6 +2181,8 @@ hash_inner_and_outer(PlannerInfo *root,
Path *cheapest_startup_outer = outerrel->cheapest_startup_path;
Path *cheapest_total_outer = outerrel->cheapest_total_path;
Path *cheapest_total_inner = innerrel->cheapest_total_path;
+ ListCell *lc1;
+ ListCell *lc2;
/*
* If either cheapest-total path is parameterized by the other rel, we
@@ -2298,114 +2194,64 @@ hash_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
return;
- /* Unique-ify if need be; we ignore parameterized possibilities */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- cheapest_total_outer = (Path *)
- create_unique_path(root, outerrel,
- cheapest_total_outer, extra->sjinfo);
- Assert(cheapest_total_outer);
- jointype = JOIN_INNER;
- try_hashjoin_path(root,
- joinrel,
- cheapest_total_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- /* no possibility of cheap startup here */
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- cheapest_total_inner = (Path *)
- create_unique_path(root, innerrel,
- cheapest_total_inner, extra->sjinfo);
- Assert(cheapest_total_inner);
- jointype = JOIN_INNER;
+ /*
+ * Consider the cheapest startup outer together with the cheapest
+ * total inner, and then consider pairings of cheapest-total paths
+ * including parameterized ones. There is no use in generating
+ * parameterized paths on the basis of possibly cheap startup cost, so
+ * this is sufficient.
+ */
+ if (cheapest_startup_outer != NULL)
try_hashjoin_path(root,
joinrel,
- cheapest_total_outer,
+ cheapest_startup_outer,
cheapest_total_inner,
hashclauses,
jointype,
extra);
- if (cheapest_startup_outer != NULL &&
- cheapest_startup_outer != cheapest_total_outer)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- }
- else
+
+ foreach(lc1, outerrel->cheapest_parameterized_paths)
{
+ Path *outerpath = (Path *) lfirst(lc1);
+
/*
- * For other jointypes, we consider the cheapest startup outer
- * together with the cheapest total inner, and then consider
- * pairings of cheapest-total paths including parameterized ones.
- * There is no use in generating parameterized paths on the basis
- * of possibly cheap startup cost, so this is sufficient.
+ * We cannot use an outer path that is parameterized by the inner
+ * rel.
*/
- ListCell *lc1;
- ListCell *lc2;
-
- if (cheapest_startup_outer != NULL)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
+ if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ continue;
- foreach(lc1, outerrel->cheapest_parameterized_paths)
+ foreach(lc2, innerrel->cheapest_parameterized_paths)
{
- Path *outerpath = (Path *) lfirst(lc1);
+ Path *innerpath = (Path *) lfirst(lc2);
/*
- * We cannot use an outer path that is parameterized by the
- * inner rel.
+ * We cannot use an inner path that is parameterized by the
+ * outer rel, either.
*/
- if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ if (PATH_PARAM_BY_REL(innerpath, outerrel))
continue;
- foreach(lc2, innerrel->cheapest_parameterized_paths)
- {
- Path *innerpath = (Path *) lfirst(lc2);
-
- /*
- * We cannot use an inner path that is parameterized by
- * the outer rel, either.
- */
- if (PATH_PARAM_BY_REL(innerpath, outerrel))
- continue;
+ if (outerpath == cheapest_startup_outer &&
+ innerpath == cheapest_total_inner)
+ continue; /* already tried it */
- if (outerpath == cheapest_startup_outer &&
- innerpath == cheapest_total_inner)
- continue; /* already tried it */
-
- try_hashjoin_path(root,
- joinrel,
- outerpath,
- innerpath,
- hashclauses,
- jointype,
- extra);
- }
+ try_hashjoin_path(root,
+ joinrel,
+ outerpath,
+ innerpath,
+ hashclauses,
+ jointype,
+ extra);
}
}
/*
* If the joinrel is parallel-safe, we may be able to consider a
- * partial hash join. However, we can't handle JOIN_UNIQUE_OUTER,
- * because the outer path will be partial, and therefore we won't be
- * able to properly guarantee uniqueness. Also, the resulting path
- * must not be parameterized.
+ * partial hash join. However, the resulting path must not be
+ * parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -2418,11 +2264,9 @@ hash_inner_and_outer(PlannerInfo *root,
/*
* Can we use a partial inner plan too, so that we can build a
- * shared hash table in parallel? We can't handle
- * JOIN_UNIQUE_INNER because we can't guarantee uniqueness.
+ * shared hash table in parallel?
*/
if (innerrel->partial_pathlist != NIL &&
- save_jointype != JOIN_UNIQUE_INNER &&
enable_parallel_hash)
{
cheapest_partial_inner =
@@ -2438,19 +2282,18 @@ hash_inner_and_outer(PlannerInfo *root,
* Normally, given that the joinrel is parallel-safe, the cheapest
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
- * path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, right-semi or right-anti join, we
- * can't use parallelism (building the hash table in each backend)
+ * path. If full, right, right-semi or right-anti join, we can't
+ * use parallelism (building the hash table in each backend)
* because no one process has all the match bits.
*/
- if (save_jointype == JOIN_FULL ||
- save_jointype == JOIN_RIGHT ||
- save_jointype == JOIN_RIGHT_SEMI ||
- save_jointype == JOIN_RIGHT_ANTI)
+ if (jointype == JOIN_FULL ||
+ jointype == JOIN_RIGHT ||
+ jointype == JOIN_RIGHT_SEMI ||
+ jointype == JOIN_RIGHT_ANTI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index aad41b94009..535248aa525 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -19,6 +19,7 @@
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
+#include "optimizer/planner.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
@@ -444,8 +445,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel2, sjinfo) != NULL)
{
/*----------
* For a semijoin, we can join the RHS to anything else by
@@ -477,8 +477,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel1->relids) &&
- create_unique_path(root, rel1, rel1->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel1, sjinfo) != NULL)
{
/* Reversed semijoin case */
if (match_sjinfo)
@@ -886,6 +885,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist)
{
+ RelOptInfo *unique_rel2;
+
/*
* Consider paths using each rel as both outer and inner. Depending on
* the join type, a provably empty outer or inner rel might mean the join
@@ -991,14 +992,13 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
/*
* If we know how to unique-ify the RHS and one input rel is
* exactly the RHS (not a superset) we can consider unique-ifying
- * it and then doing a regular join. (The create_unique_path
+ * it and then doing a regular join. (The create_unique_paths
* check here is probably redundant with what join_is_legal did,
* but if so the check is cheap because it's cached. So test
* anyway to be sure.)
*/
if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ (unique_rel2 = create_unique_paths(root, rel2, sjinfo)) != NULL)
{
if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
restriction_is_constant_false(restrictlist, joinrel, false))
@@ -1006,10 +1006,10 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
mark_dummy_rel(joinrel);
break;
}
- add_paths_to_joinrel(root, joinrel, rel1, rel2,
+ add_paths_to_joinrel(root, joinrel, rel1, unique_rel2,
JOIN_UNIQUE_INNER, sjinfo,
restrictlist);
- add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ add_paths_to_joinrel(root, joinrel, unique_rel2, rel1,
JOIN_UNIQUE_OUTER, sjinfo,
restrictlist);
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 0b61aef962c..6752e1bd902 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -95,8 +95,6 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
int flags);
static Memoize *create_memoize_plan(PlannerInfo *root, MemoizePath *best_path,
int flags);
-static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
- int flags);
static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
static Plan *create_projection_plan(PlannerInfo *root,
ProjectionPath *best_path,
@@ -106,8 +104,7 @@ static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
static IncrementalSort *create_incrementalsort_plan(PlannerInfo *root,
IncrementalSortPath *best_path, int flags);
static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
-static Unique *create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path,
- int flags);
+static Unique *create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags);
static Agg *create_agg_plan(PlannerInfo *root, AggPath *best_path);
static Plan *create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path);
static Result *create_minmaxagg_plan(PlannerInfo *root, MinMaxAggPath *best_path);
@@ -293,9 +290,9 @@ static WindowAgg *make_windowagg(List *tlist, WindowClause *wc,
static Group *make_group(List *tlist, List *qual, int numGroupCols,
AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations,
Plan *lefttree);
-static Unique *make_unique_from_sortclauses(Plan *lefttree, List *distinctList);
static Unique *make_unique_from_pathkeys(Plan *lefttree,
- List *pathkeys, int numCols);
+ List *pathkeys, int numCols,
+ Relids relids);
static Gather *make_gather(List *qptlist, List *qpqual,
int nworkers, int rescan_param, bool single_copy, Plan *subplan);
static SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy,
@@ -467,19 +464,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
flags);
break;
case T_Unique:
- if (IsA(best_path, UpperUniquePath))
- {
- plan = (Plan *) create_upper_unique_plan(root,
- (UpperUniquePath *) best_path,
- flags);
- }
- else
- {
- Assert(IsA(best_path, UniquePath));
- plan = create_unique_plan(root,
- (UniquePath *) best_path,
- flags);
- }
+ plan = (Plan *) create_unique_plan(root,
+ (UniquePath *) best_path,
+ flags);
break;
case T_Gather:
plan = (Plan *) create_gather_plan(root,
@@ -1710,207 +1697,6 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags)
return plan;
}
-/*
- * create_unique_plan
- * Create a Unique plan for 'best_path' and (recursively) plans
- * for its subpaths.
- *
- * Returns a Plan node.
- */
-static Plan *
-create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
-{
- Plan *plan;
- Plan *subplan;
- List *in_operators;
- List *uniq_exprs;
- List *newtlist;
- int nextresno;
- bool newitems;
- int numGroupCols;
- AttrNumber *groupColIdx;
- Oid *groupCollations;
- int groupColPos;
- ListCell *l;
-
- /* Unique doesn't project, so tlist requirements pass through */
- subplan = create_plan_recurse(root, best_path->subpath, flags);
-
- /* Done if we don't need to do any actual unique-ifying */
- if (best_path->umethod == UNIQUE_PATH_NOOP)
- return subplan;
-
- /*
- * As constructed, the subplan has a "flat" tlist containing just the Vars
- * needed here and at upper levels. The values we are supposed to
- * unique-ify may be expressions in these variables. We have to add any
- * such expressions to the subplan's tlist.
- *
- * The subplan may have a "physical" tlist if it is a simple scan plan. If
- * we're going to sort, this should be reduced to the regular tlist, so
- * that we don't sort more data than we need to. For hashing, the tlist
- * should be left as-is if we don't need to add any expressions; but if we
- * do have to add expressions, then a projection step will be needed at
- * runtime anyway, so we may as well remove unneeded items. Therefore
- * newtlist starts from build_path_tlist() not just a copy of the
- * subplan's tlist; and we don't install it into the subplan unless we are
- * sorting or stuff has to be added.
- */
- in_operators = best_path->in_operators;
- uniq_exprs = best_path->uniq_exprs;
-
- /* initialize modified subplan tlist as just the "required" vars */
- newtlist = build_path_tlist(root, &best_path->path);
- nextresno = list_length(newtlist) + 1;
- newitems = false;
-
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle)
- {
- tle = makeTargetEntry((Expr *) uniqexpr,
- nextresno,
- NULL,
- false);
- newtlist = lappend(newtlist, tle);
- nextresno++;
- newitems = true;
- }
- }
-
- /* Use change_plan_targetlist in case we need to insert a Result node */
- if (newitems || best_path->umethod == UNIQUE_PATH_SORT)
- subplan = change_plan_targetlist(subplan, newtlist,
- best_path->path.parallel_safe);
-
- /*
- * Build control information showing which subplan output columns are to
- * be examined by the grouping step. Unfortunately we can't merge this
- * with the previous loop, since we didn't then know which version of the
- * subplan tlist we'd end up using.
- */
- newtlist = subplan->targetlist;
- numGroupCols = list_length(uniq_exprs);
- groupColIdx = (AttrNumber *) palloc(numGroupCols * sizeof(AttrNumber));
- groupCollations = (Oid *) palloc(numGroupCols * sizeof(Oid));
-
- groupColPos = 0;
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle) /* shouldn't happen */
- elog(ERROR, "failed to find unique expression in subplan tlist");
- groupColIdx[groupColPos] = tle->resno;
- groupCollations[groupColPos] = exprCollation((Node *) tle->expr);
- groupColPos++;
- }
-
- if (best_path->umethod == UNIQUE_PATH_HASH)
- {
- Oid *groupOperators;
-
- /*
- * Get the hashable equality operators for the Agg node to use.
- * Normally these are the same as the IN clause operators, but if
- * those are cross-type operators then the equality operators are the
- * ones for the IN clause operators' RHS datatype.
- */
- groupOperators = (Oid *) palloc(numGroupCols * sizeof(Oid));
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid eq_oper;
-
- if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
- elog(ERROR, "could not find compatible hash operator for operator %u",
- in_oper);
- groupOperators[groupColPos++] = eq_oper;
- }
-
- /*
- * Since the Agg node is going to project anyway, we can give it the
- * minimum output tlist, without any stuff we might have added to the
- * subplan tlist.
- */
- plan = (Plan *) make_agg(build_path_tlist(root, &best_path->path),
- NIL,
- AGG_HASHED,
- AGGSPLIT_SIMPLE,
- numGroupCols,
- groupColIdx,
- groupOperators,
- groupCollations,
- NIL,
- NIL,
- best_path->path.rows,
- 0,
- subplan);
- }
- else
- {
- List *sortList = NIL;
- Sort *sort;
-
- /* Create an ORDER BY list to sort the input compatibly */
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid sortop;
- Oid eqop;
- TargetEntry *tle;
- SortGroupClause *sortcl;
-
- sortop = get_ordering_op_for_equality_op(in_oper, false);
- if (!OidIsValid(sortop)) /* shouldn't happen */
- elog(ERROR, "could not find ordering operator for equality operator %u",
- in_oper);
-
- /*
- * The Unique node will need equality operators. Normally these
- * are the same as the IN clause operators, but if those are
- * cross-type operators then the equality operators are the ones
- * for the IN clause operators' RHS datatype.
- */
- eqop = get_equality_op_for_ordering_op(sortop, NULL);
- if (!OidIsValid(eqop)) /* shouldn't happen */
- elog(ERROR, "could not find equality operator for ordering operator %u",
- sortop);
-
- tle = get_tle_by_resno(subplan->targetlist,
- groupColIdx[groupColPos]);
- Assert(tle != NULL);
-
- sortcl = makeNode(SortGroupClause);
- sortcl->tleSortGroupRef = assignSortGroupRef(tle,
- subplan->targetlist);
- sortcl->eqop = eqop;
- sortcl->sortop = sortop;
- sortcl->reverse_sort = false;
- sortcl->nulls_first = false;
- sortcl->hashable = false; /* no need to make this accurate */
- sortList = lappend(sortList, sortcl);
- groupColPos++;
- }
- sort = make_sort_from_sortclauses(sortList, subplan);
- label_sort_with_costsize(root, sort, -1.0);
- plan = (Plan *) make_unique_from_sortclauses((Plan *) sort, sortList);
- }
-
- /* Copy cost data from Path to Plan */
- copy_generic_path_info(plan, &best_path->path);
-
- return plan;
-}
-
/*
* create_gather_plan
*
@@ -2268,13 +2054,13 @@ create_group_plan(PlannerInfo *root, GroupPath *best_path)
}
/*
- * create_upper_unique_plan
+ * create_unique_plan
*
* Create a Unique plan for 'best_path' and (recursively) plans
* for its subpaths.
*/
static Unique *
-create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flags)
+create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
{
Unique *plan;
Plan *subplan;
@@ -2288,7 +2074,8 @@ create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flag
plan = make_unique_from_pathkeys(subplan,
best_path->path.pathkeys,
- best_path->numkeys);
+ best_path->numkeys,
+ best_path->path.parent->relids);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -6821,61 +6608,12 @@ make_group(List *tlist,
}
/*
- * distinctList is a list of SortGroupClauses, identifying the targetlist items
- * that should be considered by the Unique filter. The input path must
- * already be sorted accordingly.
- */
-static Unique *
-make_unique_from_sortclauses(Plan *lefttree, List *distinctList)
-{
- Unique *node = makeNode(Unique);
- Plan *plan = &node->plan;
- int numCols = list_length(distinctList);
- int keyno = 0;
- AttrNumber *uniqColIdx;
- Oid *uniqOperators;
- Oid *uniqCollations;
- ListCell *slitem;
-
- plan->targetlist = lefttree->targetlist;
- plan->qual = NIL;
- plan->lefttree = lefttree;
- plan->righttree = NULL;
-
- /*
- * convert SortGroupClause list into arrays of attr indexes and equality
- * operators, as wanted by executor
- */
- Assert(numCols > 0);
- uniqColIdx = (AttrNumber *) palloc(sizeof(AttrNumber) * numCols);
- uniqOperators = (Oid *) palloc(sizeof(Oid) * numCols);
- uniqCollations = (Oid *) palloc(sizeof(Oid) * numCols);
-
- foreach(slitem, distinctList)
- {
- SortGroupClause *sortcl = (SortGroupClause *) lfirst(slitem);
- TargetEntry *tle = get_sortgroupclause_tle(sortcl, plan->targetlist);
-
- uniqColIdx[keyno] = tle->resno;
- uniqOperators[keyno] = sortcl->eqop;
- uniqCollations[keyno] = exprCollation((Node *) tle->expr);
- Assert(OidIsValid(uniqOperators[keyno]));
- keyno++;
- }
-
- node->numCols = numCols;
- node->uniqColIdx = uniqColIdx;
- node->uniqOperators = uniqOperators;
- node->uniqCollations = uniqCollations;
-
- return node;
-}
-
-/*
- * as above, but use pathkeys to identify the sort columns and semantics
+ * pathkeys is a list of PathKeys, identifying the sort columns and semantics.
+ * The input path must already be sorted accordingly.
*/
static Unique *
-make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
+make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols,
+ Relids relids)
{
Unique *node = makeNode(Unique);
Plan *plan = &node->plan;
@@ -6938,7 +6676,7 @@ make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
foreach(j, plan->targetlist)
{
tle = (TargetEntry *) lfirst(j);
- em = find_ec_member_matching_expr(ec, tle->expr, NULL);
+ em = find_ec_member_matching_expr(ec, tle->expr, relids);
if (em)
{
/* found expr already in tlist */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 549aedcfa99..18be12808f3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -267,6 +267,12 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
+static void create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
+static void create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
/*****************************************************************************
@@ -4917,10 +4923,10 @@ create_partial_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_partial_path(partial_distinct_rel, (Path *)
- create_upper_unique_path(root, partial_distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, partial_distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -5111,10 +5117,10 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_path(distinct_rel, (Path *)
- create_upper_unique_path(root, distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -8248,3 +8254,490 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
return grouplist;
}
+
+/*
+ * create_unique_paths
+ * Build a new RelOptInfo containing Paths that represent elimination of
+ * distinct rows from the input data. Distinct-ness is defined according to
+ * the needs of the semijoin represented by sjinfo. If it is not possible
+ * to identify how to make the data unique, NULL is returned.
+ *
+ * If used at all, this is likely to be called repeatedly on the same rel;
+ * So we cache the result.
+ */
+RelOptInfo *
+create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *unique_rel;
+ List *sortPathkeys = NIL;
+ List *groupClause = NIL;
+ MemoryContext oldcontext;
+
+ /* Caller made a mistake if SpecialJoinInfo is the wrong one */
+ Assert(sjinfo->jointype == JOIN_SEMI);
+ Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
+
+ /* If result already cached, return it */
+ if (rel->unique_rel)
+ return rel->unique_rel;
+
+ /* If it's not possible to unique-ify, return NULL */
+ if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
+ return NULL;
+
+ /*
+ * When called during GEQO join planning, we are in a short-lived memory
+ * context. We must make sure that the unique rel and any subsidiary data
+ * structures created for a baserel survive the GEQO cycle, else the
+ * baserel is trashed for future GEQO cycles. On the other hand, when we
+ * are creating those for a joinrel during GEQO, we don't want them to
+ * clutter the main planning context. Upshot is that the best solution is
+ * to explicitly allocate memory in the same context the given RelOptInfo
+ * is in.
+ */
+ oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
+
+ unique_rel = makeNode(RelOptInfo);
+ memcpy(unique_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ unique_rel->pathlist = NIL;
+ unique_rel->ppilist = NIL;
+ unique_rel->partial_pathlist = NIL;
+ unique_rel->cheapest_startup_path = NULL;
+ unique_rel->cheapest_total_path = NULL;
+ unique_rel->cheapest_parameterized_paths = NIL;
+
+ /* Estimate number of output rows */
+ unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ rel->rows,
+ NULL,
+ NULL);
+
+ /*
+ * Build the target list for the unique rel. We also build the pathkeys
+ * that represent the ordering requirements for the sort-based
+ * implementation, and the list of SortGroupClause nodes that represent
+ * the columns to be grouped on for the hash-based implementation.
+ *
+ * For a child rel, we can construct these fields from those of its
+ * parent.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ PathTarget *child_unique_target;
+ PathTarget *parent_unique_target;
+
+ parent_unique_target = rel->top_parent->unique_rel->reltarget;
+
+ child_unique_target = copy_pathtarget(parent_unique_target);
+
+ /* Translate the target expressions */
+ child_unique_target->exprs = (List *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) parent_unique_target->exprs,
+ rel,
+ rel->top_parent);
+
+ unique_rel->reltarget = child_unique_target;
+
+ sortPathkeys = rel->top_parent->unique_pathkeys;
+ groupClause = rel->top_parent->unique_groupclause;
+ }
+ else
+ {
+ List *newtlist;
+ int nextresno;
+ List *sortList = NIL;
+ ListCell *lc1;
+ ListCell *lc2;
+
+ /*
+ * The values we are supposed to unique-ify may be expressions in the
+ * variables of the input rel's targetlist. We have to add any such
+ * expressions to the unique rel's targetlist.
+ *
+ * While in the loop, build the lists of SortGroupClause's that
+ * represent the ordering for the sort-based implementation and the
+ * grouping for the hash-based implementation.
+ */
+ newtlist = make_tlist_from_pathtarget(rel->reltarget);
+ nextresno = list_length(newtlist) + 1;
+
+ forboth(lc1, sjinfo->semi_rhs_exprs, lc2, sjinfo->semi_operators)
+ {
+ Expr *uniqexpr = lfirst(lc1);
+ Oid in_oper = lfirst_oid(lc2);
+ Oid sortop = InvalidOid;
+ TargetEntry *tle;
+
+ tle = tlist_member(uniqexpr, newtlist);
+ if (!tle)
+ {
+ tle = makeTargetEntry((Expr *) uniqexpr,
+ nextresno,
+ NULL,
+ false);
+ newtlist = lappend(newtlist, tle);
+ nextresno++;
+ }
+
+ if (sjinfo->semi_can_btree)
+ {
+ /* Create an ORDER BY list to sort the input compatibly */
+ Oid eqop;
+ SortGroupClause *sortcl;
+
+ sortop = get_ordering_op_for_equality_op(in_oper, false);
+ if (!OidIsValid(sortop)) /* shouldn't happen */
+ elog(ERROR, "could not find ordering operator for equality operator %u",
+ in_oper);
+
+ /*
+ * The Unique node will need equality operators. Normally
+ * these are the same as the IN clause operators, but if those
+ * are cross-type operators then the equality operators are
+ * the ones for the IN clause operators' RHS datatype.
+ */
+ eqop = get_equality_op_for_ordering_op(sortop, NULL);
+ if (!OidIsValid(eqop)) /* shouldn't happen */
+ elog(ERROR, "could not find equality operator for ordering operator %u",
+ sortop);
+
+ sortcl = makeNode(SortGroupClause);
+ sortcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ sortcl->eqop = eqop;
+ sortcl->sortop = sortop;
+ sortcl->reverse_sort = false;
+ sortcl->nulls_first = false;
+ sortcl->hashable = false; /* no need to make this accurate */
+ sortList = lappend(sortList, sortcl);
+ }
+ if (sjinfo->semi_can_hash)
+ {
+ /* Create a GROUP BY list for the Agg node to use */
+ Oid eq_oper;
+ SortGroupClause *groupcl;
+
+ /*
+ * Get the hashable equality operators for the Agg node to
+ * use. Normally these are the same as the IN clause
+ * operators, but if those are cross-type operators then the
+ * equality operators are the ones for the IN clause
+ * operators' RHS datatype.
+ */
+ if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
+ elog(ERROR, "could not find compatible hash operator for operator %u",
+ in_oper);
+
+ groupcl = makeNode(SortGroupClause);
+ groupcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ groupcl->eqop = eq_oper;
+ groupcl->sortop = sortop;
+ groupcl->reverse_sort = false;
+ groupcl->nulls_first = false;
+ groupcl->hashable = true;
+ groupClause = lappend(groupClause, groupcl);
+ }
+ }
+
+ unique_rel->reltarget = create_pathtarget(root, newtlist);
+ sortPathkeys = make_pathkeys_for_sortclauses(root, sortList, newtlist);
+ }
+
+ /* build unique paths based on input rel's pathlist */
+ create_final_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* build unique paths based on input rel's partial_pathlist */
+ create_partial_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* Now choose the best path(s) */
+ set_cheapest(unique_rel);
+
+ /*
+ * There shouldn't be any partial paths for the unique relation;
+ * otherwise, we won't be able to properly guarantee uniqueness.
+ */
+ Assert(unique_rel->partial_pathlist == NIL);
+
+ /* Cache the result */
+ rel->unique_rel = unique_rel;
+ rel->unique_pathkeys = sortPathkeys;
+ rel->unique_groupclause = groupClause;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return unique_rel;
+}
+
+/*
+ * create_final_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' pathlist
+ */
+static void
+create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ */
+ foreach(lc, input_rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ input_path,
+ unique_rel->reltarget);
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != input_rel->cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, unique_rel, path,
+ list_length(sortPathkeys),
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ input_rel->cheapest_total_path,
+ unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ unique_rel,
+ path,
+ unique_rel->reltarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+
+ }
+}
+
+/*
+ * create_partial_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' partial_pathlist
+ */
+static void
+create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ RelOptInfo *partial_unique_rel;
+ Path *cheapest_partial_path;
+
+ /* nothing to do when there are no partial paths in the input rel */
+ if (!input_rel->consider_parallel || input_rel->partial_pathlist == NIL)
+ return;
+
+ /*
+ * nothing to do if there's anything in the targetlist that's
+ * parallel-restricted.
+ */
+ if (!is_parallel_safe(root, (Node *) unique_rel->reltarget->exprs))
+ return;
+
+ cheapest_partial_path = linitial(input_rel->partial_pathlist);
+
+ partial_unique_rel = makeNode(RelOptInfo);
+ memcpy(partial_unique_rel, input_rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ partial_unique_rel->pathlist = NIL;
+ partial_unique_rel->ppilist = NIL;
+ partial_unique_rel->partial_pathlist = NIL;
+ partial_unique_rel->cheapest_startup_path = NULL;
+ partial_unique_rel->cheapest_total_path = NULL;
+ partial_unique_rel->cheapest_parameterized_paths = NIL;
+
+ /* Estimate number of output rows */
+ partial_unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ cheapest_partial_path->rows,
+ NULL,
+ NULL);
+ partial_unique_rel->reltarget = unique_rel->reltarget;
+
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest partial path and incremental sort on any paths
+ * with presorted keys.
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ input_path,
+ partial_unique_rel->reltarget);
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, partial_unique_rel, path,
+ list_length(sortPathkeys),
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ cheapest_partial_path,
+ partial_unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ partial_unique_rel,
+ path,
+ partial_unique_rel->reltarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+
+ if (partial_unique_rel->partial_pathlist != NIL)
+ {
+ generate_useful_gather_paths(root, partial_unique_rel, true);
+ set_cheapest(partial_unique_rel);
+
+ /*
+ * Finally, create paths to unique-ify the final result. This step is
+ * needed to remove any duplicates due to combining rows from parallel
+ * workers.
+ */
+ create_final_unique_paths(root, partial_unique_rel,
+ sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+ }
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index eab44da65b8..28a4ae64440 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -929,11 +929,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
@@ -946,11 +946,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
}
}
@@ -970,11 +970,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
NULL);
/* and make the MergeAppend unique */
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(tlist),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(tlist),
+ dNumGroups);
add_path(result_rel, path);
}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e0192d4a491..2ee06dc7317 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,7 +46,6 @@ typedef enum
*/
#define STD_FUZZ_FACTOR 1.01
-static List *translate_sub_tlist(List *tlist, int relid);
static int append_total_cost_compare(const ListCell *a, const ListCell *b);
static int append_startup_cost_compare(const ListCell *a, const ListCell *b);
static List *reparameterize_pathlist_by_child(PlannerInfo *root,
@@ -381,7 +380,6 @@ set_cheapest(RelOptInfo *parent_rel)
parent_rel->cheapest_startup_path = cheapest_startup_path;
parent_rel->cheapest_total_path = cheapest_total_path;
- parent_rel->cheapest_unique_path = NULL; /* computed only if needed */
parent_rel->cheapest_parameterized_paths = parameterized_paths;
}
@@ -1712,246 +1710,6 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * create_unique_path
- * Creates a path representing elimination of distinct rows from the
- * input data. Distinct-ness is defined according to the needs of the
- * semijoin represented by sjinfo. If it is not possible to identify
- * how to make the data unique, NULL is returned.
- *
- * If used at all, this is likely to be called repeatedly on the same rel;
- * and the input subpath should always be the same (the cheapest_total path
- * for the rel). So we cache the result.
- */
-UniquePath *
-create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- SpecialJoinInfo *sjinfo)
-{
- UniquePath *pathnode;
- Path sort_path; /* dummy for result of cost_sort */
- Path agg_path; /* dummy for result of cost_agg */
- MemoryContext oldcontext;
- int numCols;
-
- /* Caller made a mistake if subpath isn't cheapest_total ... */
- Assert(subpath == rel->cheapest_total_path);
- Assert(subpath->parent == rel);
- /* ... or if SpecialJoinInfo is the wrong one */
- Assert(sjinfo->jointype == JOIN_SEMI);
- Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
-
- /* If result already cached, return it */
- if (rel->cheapest_unique_path)
- return (UniquePath *) rel->cheapest_unique_path;
-
- /* If it's not possible to unique-ify, return NULL */
- if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
- return NULL;
-
- /*
- * When called during GEQO join planning, we are in a short-lived memory
- * context. We must make sure that the path and any subsidiary data
- * structures created for a baserel survive the GEQO cycle, else the
- * baserel is trashed for future GEQO cycles. On the other hand, when we
- * are creating those for a joinrel during GEQO, we don't want them to
- * clutter the main planning context. Upshot is that the best solution is
- * to explicitly allocate memory in the same context the given RelOptInfo
- * is in.
- */
- oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
-
- pathnode = makeNode(UniquePath);
-
- pathnode->path.pathtype = T_Unique;
- pathnode->path.parent = rel;
- pathnode->path.pathtarget = rel->reltarget;
- pathnode->path.param_info = subpath->param_info;
- pathnode->path.parallel_aware = false;
- pathnode->path.parallel_safe = rel->consider_parallel &&
- subpath->parallel_safe;
- pathnode->path.parallel_workers = subpath->parallel_workers;
-
- /*
- * Assume the output is unsorted, since we don't necessarily have pathkeys
- * to represent it. (This might get overridden below.)
- */
- pathnode->path.pathkeys = NIL;
-
- pathnode->subpath = subpath;
-
- /*
- * Under GEQO and when planning child joins, the sjinfo might be
- * short-lived, so we'd better make copies of data structures we extract
- * from it.
- */
- pathnode->in_operators = copyObject(sjinfo->semi_operators);
- pathnode->uniq_exprs = copyObject(sjinfo->semi_rhs_exprs);
-
- /*
- * If the input is a relation and it has a unique index that proves the
- * semi_rhs_exprs are unique, then we don't need to do anything. Note
- * that relation_has_unique_index_for automatically considers restriction
- * clauses for the rel, as well.
- */
- if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
- relation_has_unique_index_for(root, rel, NIL,
- sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
-
- /*
- * If the input is a subquery whose output must be unique already, then we
- * don't need to do anything. The test for uniqueness has to consider
- * exactly which columns we are extracting; for example "SELECT DISTINCT
- * x,y" doesn't guarantee that x alone is distinct. So we cannot check for
- * this optimization unless semi_rhs_exprs consists only of simple Vars
- * referencing subquery outputs. (Possibly we could do something with
- * expressions in the subquery outputs, too, but for now keep it simple.)
- */
- if (rel->rtekind == RTE_SUBQUERY)
- {
- RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
-
- if (query_supports_distinctness(rte->subquery))
- {
- List *sub_tlist_colnos;
-
- sub_tlist_colnos = translate_sub_tlist(sjinfo->semi_rhs_exprs,
- rel->relid);
-
- if (sub_tlist_colnos &&
- query_is_distinct_for(rte->subquery,
- sub_tlist_colnos,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
- }
- }
-
- /* Estimate number of output rows */
- pathnode->path.rows = estimate_num_groups(root,
- sjinfo->semi_rhs_exprs,
- rel->rows,
- NULL,
- NULL);
- numCols = list_length(sjinfo->semi_rhs_exprs);
-
- if (sjinfo->semi_can_btree)
- {
- /*
- * Estimate cost for sort+unique implementation
- */
- cost_sort(&sort_path, root, NIL,
- subpath->disabled_nodes,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width,
- 0.0,
- work_mem,
- -1.0);
-
- /*
- * Charge one cpu_operator_cost per comparison per input tuple. We
- * assume all columns get compared at most of the tuples. (XXX
- * probably this is an overestimate.) This should agree with
- * create_upper_unique_path.
- */
- sort_path.total_cost += cpu_operator_cost * rel->rows * numCols;
- }
-
- if (sjinfo->semi_can_hash)
- {
- /*
- * Estimate the overhead per hashtable entry at 64 bytes (same as in
- * planner.c).
- */
- int hashentrysize = subpath->pathtarget->width + 64;
-
- if (hashentrysize * pathnode->path.rows > get_hash_memory_limit())
- {
- /*
- * We should not try to hash. Hack the SpecialJoinInfo to
- * remember this, in case we come through here again.
- */
- sjinfo->semi_can_hash = false;
- }
- else
- cost_agg(&agg_path, root,
- AGG_HASHED, NULL,
- numCols, pathnode->path.rows,
- NIL,
- subpath->disabled_nodes,
- subpath->startup_cost,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width);
- }
-
- if (sjinfo->semi_can_btree && sjinfo->semi_can_hash)
- {
- if (agg_path.disabled_nodes < sort_path.disabled_nodes ||
- (agg_path.disabled_nodes == sort_path.disabled_nodes &&
- agg_path.total_cost < sort_path.total_cost))
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- pathnode->umethod = UNIQUE_PATH_SORT;
- }
- else if (sjinfo->semi_can_btree)
- pathnode->umethod = UNIQUE_PATH_SORT;
- else if (sjinfo->semi_can_hash)
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- {
- /* we can get here only if we abandoned hashing above */
- MemoryContextSwitchTo(oldcontext);
- return NULL;
- }
-
- if (pathnode->umethod == UNIQUE_PATH_HASH)
- {
- pathnode->path.disabled_nodes = agg_path.disabled_nodes;
- pathnode->path.startup_cost = agg_path.startup_cost;
- pathnode->path.total_cost = agg_path.total_cost;
- }
- else
- {
- pathnode->path.disabled_nodes = sort_path.disabled_nodes;
- pathnode->path.startup_cost = sort_path.startup_cost;
- pathnode->path.total_cost = sort_path.total_cost;
- }
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
-}
-
/*
* create_gather_merge_path
*
@@ -2003,36 +1761,6 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * translate_sub_tlist - get subquery column numbers represented by tlist
- *
- * The given targetlist usually contains only Vars referencing the given relid.
- * Extract their varattnos (ie, the column numbers of the subquery) and return
- * as an integer List.
- *
- * If any of the tlist items is not a simple Var, we cannot determine whether
- * the subquery's uniqueness condition (if any) matches ours, so punt and
- * return NIL.
- */
-static List *
-translate_sub_tlist(List *tlist, int relid)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, tlist)
- {
- Var *var = (Var *) lfirst(l);
-
- if (!var || !IsA(var, Var) ||
- var->varno != relid)
- return NIL; /* punt */
-
- result = lappend_int(result, var->varattno);
- }
- return result;
-}
-
/*
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
@@ -2790,8 +2518,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3046,8 +2773,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3094,8 +2820,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3171,13 +2896,10 @@ create_group_path(PlannerInfo *root,
}
/*
- * create_upper_unique_path
+ * create_unique_path
* Creates a pathnode that represents performing an explicit Unique step
* on presorted input.
*
- * This produces a Unique plan node, but the use-case is so different from
- * create_unique_path that it doesn't seem worth trying to merge the two.
- *
* 'rel' is the parent relation associated with the result
* 'subpath' is the path representing the source of data
* 'numCols' is the number of grouping columns
@@ -3186,21 +2908,20 @@ create_group_path(PlannerInfo *root,
* The input path must be sorted on the grouping columns, plus possibly
* additional columns; so the first numCols pathkeys are the grouping columns
*/
-UpperUniquePath *
-create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups)
+UniquePath *
+create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups)
{
- UpperUniquePath *pathnode = makeNode(UpperUniquePath);
+ UniquePath *pathnode = makeNode(UniquePath);
pathnode->path.pathtype = T_Unique;
pathnode->path.parent = rel;
/* Unique doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3256,8 +2977,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..0e523d2eb5b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -217,7 +217,6 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->partial_pathlist = NIL;
rel->cheapest_startup_path = NULL;
rel->cheapest_total_path = NULL;
- rel->cheapest_unique_path = NULL;
rel->cheapest_parameterized_paths = NIL;
rel->relid = relid;
rel->rtekind = rte->rtekind;
@@ -269,6 +268,9 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->fdw_private = NULL;
rel->unique_for_rels = NIL;
rel->non_unique_for_rels = NIL;
+ rel->unique_rel = NULL;
+ rel->unique_pathkeys = NIL;
+ rel->unique_groupclause = NIL;
rel->baserestrictinfo = NIL;
rel->baserestrictcost.startup = 0;
rel->baserestrictcost.per_tuple = 0;
@@ -713,7 +715,6 @@ build_join_rel(PlannerInfo *root,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
/* init direct_lateral_relids from children; we'll finish it up below */
joinrel->direct_lateral_relids =
@@ -748,6 +749,9 @@ build_join_rel(PlannerInfo *root,
joinrel->fdw_private = NULL;
joinrel->unique_for_rels = NIL;
joinrel->non_unique_for_rels = NIL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -906,7 +910,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
joinrel->direct_lateral_relids = NULL;
joinrel->lateral_relids = NULL;
@@ -933,6 +936,9 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->useridiscurrent = false;
joinrel->fdwroutine = NULL;
joinrel->fdw_private = NULL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -1488,7 +1494,6 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->pathlist = NIL;
upperrel->cheapest_startup_path = NULL;
upperrel->cheapest_total_path = NULL;
- upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fbe333d88fa..e97566b5938 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -319,8 +319,8 @@ typedef enum JoinType
* These codes are used internally in the planner, but are not supported
* by the executor (nor, indeed, by most of the planner).
*/
- JOIN_UNIQUE_OUTER, /* LHS path must be made unique */
- JOIN_UNIQUE_INNER, /* RHS path must be made unique */
+ JOIN_UNIQUE_OUTER, /* LHS has be made unique */
+ JOIN_UNIQUE_INNER, /* RHS has be made unique */
/*
* We might need additional join types someday.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595d..45f0b9c8ee9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -700,8 +700,6 @@ typedef struct PartitionSchemeData *PartitionScheme;
* (regardless of ordering) among the unparameterized paths;
* or if there is no unparameterized path, the path with lowest
* total cost among the paths with minimum parameterization
- * cheapest_unique_path - for caching cheapest path to produce unique
- * (no duplicates) output from relation; NULL if not yet requested
* cheapest_parameterized_paths - best paths for their parameterizations;
* always includes cheapest_total_path, even if that's unparameterized
* direct_lateral_relids - rels this rel has direct LATERAL references to
@@ -764,6 +762,21 @@ typedef struct PartitionSchemeData *PartitionScheme;
* other rels for which we have tried and failed to prove
* this one unique
*
+ * Three fields are used to cache information about unique-ification of this
+ * relation. This is used to support semijoins where the relation appears on
+ * the RHS: the relation is first unique-ified, and then a regular join is
+ * performed:
+ *
+ * unique_rel - the unique-ified version of the relation, containing paths
+ * that produce unique (no duplicates) output from relation;
+ * NULL if not yet requested
+ * unique_pathkeys - pathkeys that represent the ordering requirements for
+ * the relation's output in sort-based unique-ification
+ * implementations
+ * unique_groupclause - a list of SortGroupClause nodes that represent the
+ * columns to be grouped on in hash-based unique-ification
+ * implementations
+ *
* The presence of the following fields depends on the restrictions
* and joins that the relation participates in:
*
@@ -924,7 +937,6 @@ typedef struct RelOptInfo
List *partial_pathlist; /* partial Paths */
struct Path *cheapest_startup_path;
struct Path *cheapest_total_path;
- struct Path *cheapest_unique_path;
List *cheapest_parameterized_paths;
/*
@@ -1002,6 +1014,16 @@ typedef struct RelOptInfo
/* known not unique for these set(s) */
List *non_unique_for_rels;
+ /*
+ * information about unique-ification of this relation
+ */
+ /* the unique-ified version of the relation */
+ struct RelOptInfo *unique_rel;
+ /* pathkeys for sort-based unique-ification implementations */
+ List *unique_pathkeys;
+ /* SortGroupClause nodes for hash-based unique-ification implementations */
+ List *unique_groupclause;
+
/*
* used by various scans and joins:
*/
@@ -1739,8 +1761,8 @@ typedef struct ParamPathInfo
* and the specified outer rel(s).
*
* "rows" is the same as parent->rows in simple paths, but in parameterized
- * paths and UniquePaths it can be less than parent->rows, reflecting the
- * fact that we've filtered by extra join conditions or removed duplicates.
+ * paths it can be less than parent->rows, reflecting the fact that we've
+ * filtered by extra join conditions.
*
* "pathkeys" is a List of PathKey nodes (see above), describing the sort
* ordering of the path's output rows.
@@ -2137,34 +2159,6 @@ typedef struct MemoizePath
* if unknown */
} MemoizePath;
-/*
- * UniquePath represents elimination of distinct rows from the output of
- * its subpath.
- *
- * This can represent significantly different plans: either hash-based or
- * sort-based implementation, or a no-op if the input path can be proven
- * distinct already. The decision is sufficiently localized that it's not
- * worth having separate Path node types. (Note: in the no-op case, we could
- * eliminate the UniquePath node entirely and just return the subpath; but
- * it's convenient to have a UniquePath in the path tree to signal upper-level
- * routines that the input is known distinct.)
- */
-typedef enum UniquePathMethod
-{
- UNIQUE_PATH_NOOP, /* input is known unique already */
- UNIQUE_PATH_HASH, /* use hashing */
- UNIQUE_PATH_SORT, /* use sorting */
-} UniquePathMethod;
-
-typedef struct UniquePath
-{
- Path path;
- Path *subpath;
- UniquePathMethod umethod;
- List *in_operators; /* equality operators of the IN clause */
- List *uniq_exprs; /* expressions to be made unique */
-} UniquePath;
-
/*
* GatherPath runs several copies of a plan in parallel and collects the
* results. The parallel leader may also execute the plan, unless the
@@ -2371,17 +2365,17 @@ typedef struct GroupPath
} GroupPath;
/*
- * UpperUniquePath represents adjacent-duplicate removal (in presorted input)
+ * UniquePath represents adjacent-duplicate removal (in presorted input)
*
* The columns to be compared are the first numkeys columns of the path's
* pathkeys. The input is presumed already sorted that way.
*/
-typedef struct UpperUniquePath
+typedef struct UniquePath
{
Path path;
Path *subpath; /* path representing input source */
int numkeys; /* number of pathkey columns to compare */
-} UpperUniquePath;
+} UniquePath;
/*
* AggPath represents generic computation of aggregate functions
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 60dcdb77e41..71d2945b175 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -91,8 +91,6 @@ extern MemoizePath *create_memoize_path(PlannerInfo *root,
bool singlerow,
bool binary_mode,
double calls);
-extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
- Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath, PathTarget *target,
Relids required_outer, double *rows);
@@ -223,11 +221,11 @@ extern GroupPath *create_group_path(PlannerInfo *root,
List *groupClause,
List *qual,
double numGroups);
-extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups);
+extern UniquePath *create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups);
extern AggPath *create_agg_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 347c582a789..f220e9a270d 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,4 +59,7 @@ extern Path *get_cheapest_fractional_path(RelOptInfo *rel,
extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
+extern RelOptInfo *create_unique_paths(PlannerInfo *root, RelOptInfo *rel,
+ SpecialJoinInfo *sjinfo);
+
#endif /* PLANNER_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 390aabfb34b..86e5f3bb30c 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -9468,23 +9468,20 @@ where exists (select 1 from tenk1 t3
---------------------------------------------------------------------------------
Nested Loop
Output: t1.unique1, t2.hundred
- -> Hash Join
+ -> Nested Loop
Output: t1.unique1, t3.tenthous
- Hash Cond: (t3.thousand = t1.unique1)
- -> HashAggregate
+ -> Index Only Scan using onek_unique1 on public.onek t1
+ Output: t1.unique1
+ Index Cond: (t1.unique1 < 1)
+ -> Unique
Output: t3.thousand, t3.tenthous
- Group Key: t3.thousand, t3.tenthous
-> Index Only Scan using tenk1_thous_tenthous on public.tenk1 t3
Output: t3.thousand, t3.tenthous
- -> Hash
- Output: t1.unique1
- -> Index Only Scan using onek_unique1 on public.onek t1
- Output: t1.unique1
- Index Cond: (t1.unique1 < 1)
+ Index Cond: (t3.thousand = t1.unique1)
-> Index Only Scan using tenk1_hundred on public.tenk1 t2
Output: t2.hundred
Index Cond: (t2.hundred = t3.tenthous)
-(18 rows)
+(15 rows)
-- ... unless it actually is unique
create table j3 as select unique1, tenthous from onek;
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index d5368186caa..24e06845f92 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1134,48 +1134,50 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- Join Filter: (t1_2.a = t1_5.b)
- -> HashAggregate
- Group Key: t1_5.b
+ -> Nested Loop
+ Join Filter: (t1_2.a = t1_5.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_5.b
-> Hash Join
Hash Cond: (((t2_1.a + t2_1.b) / 2) = t1_5.b)
-> Seq Scan on prt1_e_p1 t2_1
-> Hash
-> Seq Scan on prt2_p1 t1_5
Filter: (a = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
- Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_3.a = t1_6.b)
- -> HashAggregate
- Group Key: t1_6.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
+ Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_3.a = t1_6.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Join
Hash Cond: (((t2_2.a + t2_2.b) / 2) = t1_6.b)
-> Seq Scan on prt1_e_p2 t2_2
-> Hash
-> Seq Scan on prt2_p2 t1_6
Filter: (a = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
- Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_4.a = t1_7.b)
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
+ Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_4.a = t1_7.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Nested Loop
-> Seq Scan on prt2_p3 t1_7
Filter: (a = 0)
-> Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t2_3
Index Cond: (((a + b) / 2) = t1_7.b)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
- Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
- Filter: (b = 0)
-(41 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
+ Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
+ Filter: (b = 0)
+(43 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
a | b | c
@@ -1190,46 +1192,48 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_6.b
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Semi Join
Hash Cond: (t1_6.b = ((t1_9.a + t1_9.b) / 2))
-> Seq Scan on prt2_p1 t1_6
-> Hash
-> Seq Scan on prt1_e_p1 t1_9
Filter: (c = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
- Index Cond: (a = t1_6.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
+ Index Cond: (a = t1_6.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Hash Semi Join
Hash Cond: (t1_7.b = ((t1_10.a + t1_10.b) / 2))
-> Seq Scan on prt2_p2 t1_7
-> Hash
-> Seq Scan on prt1_e_p2 t1_10
Filter: (c = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
- Index Cond: (a = t1_7.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_8.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
+ Index Cond: (a = t1_7.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_8.b
-> Hash Semi Join
Hash Cond: (t1_8.b = ((t1_11.a + t1_11.b) / 2))
-> Seq Scan on prt2_p3 t1_8
-> Hash
-> Seq Scan on prt1_e_p3 t1_11
Filter: (c = 0)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
- Index Cond: (a = t1_8.b)
- Filter: (b = 0)
-(39 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
+ Index Cond: (a = t1_8.b)
+ Filter: (b = 0)
+(41 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 40d8056fcea..66732f9b866 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -707,6 +707,212 @@ select * from numeric_table
3
(4 rows)
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Index Cond: (t2.a = t3.b)
+(18 rows)
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Incremental Sort
+ Output: t1.a, t1.b, t2.a, t2.b
+ Sort Key: t1.a, t2.a
+ Presorted Key: t1.a
+ -> Merge Join
+ Output: t1.a, t1.b, t2.a, t2.b
+ Merge Cond: (t1.a = ((t3.a + 1)))
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Sort
+ Output: t2.a, t2.b, t3.a, ((t3.a + 1))
+ Sort Key: ((t3.a + 1))
+ -> Hash Join
+ Output: t2.a, t2.b, t3.a, ((t3.a + 1))
+ Hash Cond: (t2.a = ((t3.b + 1)))
+ -> Seq Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ -> Hash
+ Output: t3.a, t3.b, ((t3.a + 1)), ((t3.b + 1))
+ -> HashAggregate
+ Output: t3.a, t3.b, ((t3.a + 1)), ((t3.b + 1))
+ Group Key: (t3.a + 1), (t3.b + 1)
+ -> Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b, (t3.a + 1), (t3.b + 1)
+(24 rows)
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+set enable_indexscan to off;
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Gather Merge
+ Output: t3.a, t3.b
+ Workers Planned: 2
+ -> Sort
+ Output: t3.a, t3.b
+ Sort Key: t3.a, t3.b
+ -> HashAggregate
+ Output: t3.a, t3.b
+ Group Key: t3.a, t3.b
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Materialize
+ Output: t1.a, t1.b
+ -> Gather Merge
+ Output: t1.a, t1.b
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, t1.b
+ Sort Key: t1.a
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Bitmap Heap Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Recheck Cond: (t2.a = t3.b)
+ -> Bitmap Index Scan on semijoin_unique_tbl_a_b_idx
+ Index Cond: (t2.a = t3.b)
+(37 rows)
+
+reset enable_indexscan;
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+drop table semijoin_unique_tbl;
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+set enable_partitionwise_join to on;
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: t1.a
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t2_1.a, t2_1.b
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t3_1.a
+ -> Unique
+ Output: t3_1.a
+ -> Index Only Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t3_1
+ Output: t3_1.a
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t1_1
+ Output: t1_1.a, t1_1.b
+ Index Cond: (t1_1.a = t3_1.a)
+ -> Memoize
+ Output: t2_1.a, t2_1.b
+ Cache Key: t1_1.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t2_1
+ Output: t2_1.a, t2_1.b
+ Index Cond: (t2_1.a = t1_1.a)
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t2_2.a, t2_2.b
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t3_2.a
+ -> Unique
+ Output: t3_2.a
+ -> Index Only Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t3_2
+ Output: t3_2.a
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t1_2
+ Output: t1_2.a, t1_2.b
+ Index Cond: (t1_2.a = t3_2.a)
+ -> Memoize
+ Output: t2_2.a, t2_2.b
+ Cache Key: t1_2.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t2_2
+ Output: t2_2.a, t2_2.b
+ Index Cond: (t2_2.a = t1_2.a)
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t2_3.a, t2_3.b
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t3_3.a
+ -> Unique
+ Output: t3_3.a
+ -> Sort
+ Output: t3_3.a
+ Sort Key: t3_3.a
+ -> Seq Scan on public.unique_tbl_p3 t3_3
+ Output: t3_3.a
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t1_3
+ Output: t1_3.a, t1_3.b
+ Index Cond: (t1_3.a = t3_3.a)
+ -> Memoize
+ Output: t2_3.a, t2_3.b
+ Cache Key: t1_3.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t2_3
+ Output: t2_3.a, t2_3.b
+ Index Cond: (t2_3.a = t1_3.a)
+(59 rows)
+
+reset enable_partitionwise_join;
+drop table unique_tbl_p;
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
@@ -2672,18 +2878,17 @@ EXPLAIN (COSTS OFF)
SELECT * FROM onek
WHERE (unique1,ten) IN (VALUES (1,1), (20,0), (99,9), (17,99))
ORDER BY unique1;
- QUERY PLAN
------------------------------------------------------------------
- Sort
- Sort Key: onek.unique1
- -> Nested Loop
- -> HashAggregate
- Group Key: "*VALUES*".column1, "*VALUES*".column2
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: "*VALUES*".column1, "*VALUES*".column2
-> Values Scan on "*VALUES*"
- -> Index Scan using onek_unique1 on onek
- Index Cond: (unique1 = "*VALUES*".column1)
- Filter: ("*VALUES*".column2 = ten)
-(9 rows)
+ -> Index Scan using onek_unique1 on onek
+ Index Cond: (unique1 = "*VALUES*".column1)
+ Filter: ("*VALUES*".column2 = ten)
+(8 rows)
EXPLAIN (COSTS OFF)
SELECT * FROM onek
@@ -2858,12 +3063,10 @@ SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) ORDER BY 1);
-> Unique
-> Sort
Sort Key: "*VALUES*".column1
- -> Sort
- Sort Key: "*VALUES*".column1
- -> Values Scan on "*VALUES*"
+ -> Values Scan on "*VALUES*"
-> Index Scan using onek_unique1 on onek
Index Cond: (unique1 = "*VALUES*".column1)
-(9 rows)
+(7 rows)
EXPLAIN (COSTS OFF)
SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) LIMIT 1);
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index fec38ef85a6..a93fd222441 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -361,6 +361,73 @@ select * from float_table
select * from numeric_table
where num_col in (select float_col from float_table);
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+
+set enable_indexscan to off;
+
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+reset enable_indexscan;
+
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+
+drop table semijoin_unique_tbl;
+
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+
+set enable_partitionwise_join to on;
+
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+
+reset enable_partitionwise_join;
+
+drop table unique_tbl_p;
+
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 32d6e718adc..0d181242d6b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3148,7 +3148,6 @@ UnicodeNormalizationForm
UnicodeNormalizationQC
Unique
UniquePath
-UniquePathMethod
UniqueRelInfo
UniqueState
UnlistenStmt
@@ -3164,7 +3163,6 @@ UpgradeTaskSlotState
UpgradeTaskStep
UploadManifestCmd
UpperRelationKind
-UpperUniquePath
UserAuth
UserContext
UserMapping
--
2.43.0
On Tue, Jul 1, 2025 at 11:57 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, Jun 3, 2025 at 4:52 PM Richard Guo <guofenglinux@gmail.com> wrote:
Here is an updated version of the patch, which is ready for review.
I've fixed a cost estimation issue, improved some comments, and added
a commit message. Nothing essential has changed.
This patch does not apply anymore, and here is a new rebase.
This patch does not apply again, so here is a new rebase.
This version also fixes an issue related to parameterized paths: if
the RHS has LATERAL references to the LHS, unique-ification becomes
meaningless because the RHS depends on the LHS, and such paths should
not be generated.
Thanks
Richard
Attachments:
v4-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchapplication/octet-stream; name=v4-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchDownload
From e1a557dee85b73e43b302598b13ce57f13a6c313 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 21 May 2025 12:32:29 +0900
Subject: [PATCH v4] Pathify RHS unique-ification for semijoin planning
There are two implementation techniques for semijoins: one uses the
JOIN_SEMI jointype, where the executor emits at most one matching row
per left-hand side (LHS) row; the other unique-ifies the right-hand
side (RHS) and then performs a plain inner join.
The latter technique currently has some drawbacks related to the
unique-ification step.
* Only the cheapest-total path of the RHS is considered during
unique-ification. This may cause us to miss some optimization
opportunities; for example, a path with a better sort order might be
overlooked simply because it is not the cheapest in total cost. Such
a path could help avoid a sort at a higher level, potentially
resulting in a cheaper overall plan.
* We currently rely on heuristics to choose between hash-based and
sort-based unique-ification. A better approach would be to generate
paths for both methods and allow add_path() to decide which one is
preferable, consistent with how path selection is handled elsewhere in
the planner.
* In the sort-based implementation, we currently pay no attention to
the pathkeys of the input subpath or the resulting output. This can
result in redundant sort nodes being added to the final plan.
This patch improves semijoin planning by creating a new RelOptInfo for
the RHS rel to represent its unique-ified version. It then generates
multiple paths that represent elimination of distinct rows from the
RHS, considering both a hash-based implementation using the cheapest
total path of the original RHS rel, and sort-based implementations
that either exploit presorted input paths or explicitly sort the
cheapest total path. All resulting paths compete in add_path(), and
those deemed worthy of consideration are added to the new RelOptInfo.
Finally, the unique-ified rel is joined with the other side of the
semijoin using a plain inner join.
As a side effect, most of the code related to the JOIN_UNIQUE_OUTER
and JOIN_UNIQUE_INNER jointypes -- used to indicate that the LHS or
RHS path should be made unique -- has been removed. Besides, the
T_Unique path now has the same meaning for both semijoins and upper
DISTINCT clauses: it represents adjacent-duplicate removal on
presorted input. This patch unifies their handling by sharing the
same data structures and functions.
---
src/backend/optimizer/README | 3 +-
src/backend/optimizer/path/costsize.c | 6 +-
src/backend/optimizer/path/joinpath.c | 345 ++++--------
src/backend/optimizer/path/joinrels.c | 18 +-
src/backend/optimizer/plan/createplan.c | 292 +----------
src/backend/optimizer/plan/planner.c | 522 ++++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 30 +-
src/backend/optimizer/util/pathnode.c | 306 +----------
src/backend/optimizer/util/relnode.c | 13 +-
src/include/nodes/nodes.h | 4 +-
src/include/nodes/pathnodes.h | 66 ++-
src/include/optimizer/pathnode.h | 12 +-
src/include/optimizer/planner.h | 3 +
src/test/regress/expected/join.out | 15 +-
src/test/regress/expected/partition_join.out | 94 ++--
src/test/regress/expected/subselect.out | 233 ++++++++-
src/test/regress/sql/subselect.sql | 67 +++
src/tools/pgindent/typedefs.list | 2 -
18 files changed, 1062 insertions(+), 969 deletions(-)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..843368096fd 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -640,7 +640,6 @@ RelOptInfo - a relation or joined relations
GroupResultPath - childless Result plan node (used for degenerate grouping)
MaterialPath - a Material plan node
MemoizePath - a Memoize plan node for caching tuples from sub-paths
- UniquePath - remove duplicate rows (either by hashing or sorting)
GatherPath - collect the results of parallel workers
GatherMergePath - collect parallel results, preserving their common sort order
ProjectionPath - a Result plan node with child (used for projection)
@@ -648,7 +647,7 @@ RelOptInfo - a relation or joined relations
SortPath - a Sort plan node applied to some sub-path
IncrementalSortPath - an IncrementalSort plan node applied to some sub-path
GroupPath - a Group plan node applied to some sub-path
- UpperUniquePath - a Unique plan node applied to some sub-path
+ UniquePath - a Unique plan node applied to some sub-path
AggPath - an Agg plan node applied to some sub-path
GroupingSetsPath - an Agg plan node used to implement GROUPING SETS
MinMaxAggPath - a Result plan node with subplans performing MIN/MAX
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 3d44815ed5a..2da6880c152 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3937,7 +3937,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
* The whole issue is moot if we are working from a unique-ified outer
* input, or if we know we don't need to mark/restore at all.
*/
- if (IsA(outer_path, UniquePath) || path->skip_mark_restore)
+ if (IsA(outer_path, UniquePath) ||
+ IsA(outer_path, AggPath) ||
+ path->skip_mark_restore)
rescannedtuples = 0;
else
{
@@ -4332,7 +4334,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
* because we avoid contaminating the cache with a value that's wrong for
* non-unique-ified paths.
*/
- if (IsA(inner_path, UniquePath))
+ if (IsA(inner_path, UniquePath) || IsA(inner_path, AggPath))
{
innerbucketsize = 1.0 / virtualbuckets;
innermcvfreq = 0.0;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index ebedc5574ca..5a2b2bbefdb 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -112,12 +112,12 @@ static void generate_mergejoin_paths(PlannerInfo *root,
* "flipped around" if we are considering joining the rels in the opposite
* direction from what's indicated in sjinfo.
*
- * Also, this routine and others in this module accept the special JoinTypes
- * JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that we should
- * unique-ify the outer or inner relation and then apply a regular inner
- * join. These values are not allowed to propagate outside this module,
- * however. Path cost estimation code may need to recognize that it's
- * dealing with such a case --- the combination of nominal jointype INNER
+ * Also, this routine accepts the special JoinTypes JOIN_UNIQUE_OUTER and
+ * JOIN_UNIQUE_INNER to indicate that the outer or inner relation has been
+ * unique-ified and a regular inner join should then be applied. These values
+ * are not allowed to propagate outside this routine, however. Path cost
+ * estimation code, as well as match_unsorted_outer, may need to recognize that
+ * it's dealing with such a case --- the combination of nominal jointype INNER
* with sjinfo->jointype == JOIN_SEMI indicates that.
*/
void
@@ -129,6 +129,7 @@ add_paths_to_joinrel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
List *restrictlist)
{
+ JoinType save_jointype = jointype;
JoinPathExtraData extra;
bool mergejoin_allowed = true;
ListCell *lc;
@@ -165,10 +166,10 @@ add_paths_to_joinrel(PlannerInfo *root,
* reduce_unique_semijoins would've simplified it), so there's no point in
* calling innerrel_is_unique. However, if the LHS covers all of the
* semijoin's min_lefthand, then it's appropriate to set inner_unique
- * because the path produced by create_unique_path will be unique relative
- * to the LHS. (If we have an LHS that's only part of the min_lefthand,
- * that is *not* true.) For JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid
- * letting that value escape this module.
+ * because the unique relation produced by create_unique_paths will be
+ * unique relative to the LHS. (If we have an LHS that's only part of the
+ * min_lefthand, that is *not* true.) For JOIN_UNIQUE_OUTER, pass
+ * JOIN_INNER to avoid letting that value escape this module.
*/
switch (jointype)
{
@@ -199,6 +200,13 @@ add_paths_to_joinrel(PlannerInfo *root,
break;
}
+ /*
+ * If the outer or inner relation has been unique-ified, handle as a plain
+ * inner join.
+ */
+ if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
+ jointype = JOIN_INNER;
+
/*
* Find potential mergejoin clauses. We can skip this if we are not
* interested in doing a mergejoin. However, mergejoin may be our only
@@ -329,7 +337,7 @@ add_paths_to_joinrel(PlannerInfo *root,
joinrel->fdwroutine->GetForeignJoinPaths)
joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
/*
* 6. Finally, give extensions a chance to manipulate the path list. They
@@ -339,7 +347,7 @@ add_paths_to_joinrel(PlannerInfo *root,
*/
if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
}
/*
@@ -1364,7 +1372,6 @@ sort_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *outer_path;
Path *inner_path;
Path *cheapest_partial_outer = NULL;
@@ -1402,38 +1409,16 @@ sort_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(inner_path, outerrel))
return;
- /*
- * If unique-ification is requested, do it and then handle as a plain
- * inner join.
- */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- outer_path = (Path *) create_unique_path(root, outerrel,
- outer_path, extra->sjinfo);
- Assert(outer_path);
- jointype = JOIN_INNER;
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- inner_path = (Path *) create_unique_path(root, innerrel,
- inner_path, extra->sjinfo);
- Assert(inner_path);
- jointype = JOIN_INNER;
- }
-
/*
* If the joinrel is parallel-safe, we may be able to consider a partial
- * merge join. However, we can't handle JOIN_UNIQUE_OUTER, because the
- * outer path will be partial, and therefore we won't be able to properly
- * guarantee uniqueness. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT
- * and JOIN_RIGHT_ANTI, because they can produce false null extended rows.
+ * merge join. However, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * JOIN_RIGHT_ANTI, because they can produce false null extended rows.
* Also, the resulting path must not be parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -1441,7 +1426,7 @@ sort_inner_and_outer(PlannerInfo *root,
if (inner_path->parallel_safe)
cheapest_safe_inner = inner_path;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
@@ -1580,13 +1565,9 @@ generate_mergejoin_paths(PlannerInfo *root,
List *trialsortkeys;
Path *cheapest_startup_inner;
Path *cheapest_total_inner;
- JoinType save_jointype = jointype;
int num_sortkeys;
int sortkeycnt;
- if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/* Look for useful mergeclauses (if any) */
mergeclauses =
find_mergeclauses_for_outer_pathkeys(root,
@@ -1636,10 +1617,6 @@ generate_mergejoin_paths(PlannerInfo *root,
extra,
is_partial);
- /* Can't do anything else if inner path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
/*
* Look for presorted inner paths that satisfy the innersortkey list ---
* or any truncation thereof, if we are allowed to build a mergejoin using
@@ -1819,7 +1796,6 @@ match_unsorted_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool nestjoinOK;
bool useallclauses;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
@@ -1855,12 +1831,6 @@ match_unsorted_outer(PlannerInfo *root,
nestjoinOK = false;
useallclauses = true;
break;
- case JOIN_UNIQUE_OUTER:
- case JOIN_UNIQUE_INNER:
- jointype = JOIN_INNER;
- nestjoinOK = true;
- useallclauses = false;
- break;
default:
elog(ERROR, "unrecognized join type: %d",
(int) jointype);
@@ -1873,24 +1843,27 @@ match_unsorted_outer(PlannerInfo *root,
* If inner_cheapest_total is parameterized by the outer rel, ignore it;
* we will consider it below as a member of cheapest_parameterized_paths,
* but the other possibilities considered in this routine aren't usable.
+ *
+ * Furthermore, if the inner side is a unique-ified relation, we cannot
+ * generate any valid paths here, because the inner rel's dependency on
+ * the outer rel makes unique-ification meaningless.
*/
if (PATH_PARAM_BY_REL(inner_cheapest_total, outerrel))
+ {
inner_cheapest_total = NULL;
- /*
- * If we need to unique-ify the inner path, we will consider only the
- * cheapest-total inner.
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /* No way to do this with an inner path parameterized by outer rel */
- if (inner_cheapest_total == NULL)
+ /*
+ * When the nominal jointype is JOIN_INNER, sjinfo->jointype is
+ * JOIN_SEMI, and the inner rel is exactly the RHS of the semijoin, it
+ * indicates that the inner side is a unique-ified relation.
+ */
+ if (jointype == JOIN_INNER &&
+ extra->sjinfo->jointype == JOIN_SEMI &&
+ bms_equal(extra->sjinfo->syn_righthand, innerrel->relids))
return;
- inner_cheapest_total = (Path *)
- create_unique_path(root, innerrel, inner_cheapest_total, extra->sjinfo);
- Assert(inner_cheapest_total);
}
- else if (nestjoinOK)
+
+ if (nestjoinOK)
{
/*
* Consider materializing the cheapest inner path, unless
@@ -1914,20 +1887,6 @@ match_unsorted_outer(PlannerInfo *root,
if (PATH_PARAM_BY_REL(outerpath, innerrel))
continue;
- /*
- * If we need to unique-ify the outer path, it's pointless to consider
- * any but the cheapest outer. (XXX we don't consider parameterized
- * outers, nor inners, for unique-ified cases. Should we?)
- */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- {
- if (outerpath != outerrel->cheapest_total_path)
- continue;
- outerpath = (Path *) create_unique_path(root, outerrel,
- outerpath, extra->sjinfo);
- Assert(outerpath);
- }
-
/*
* The result will have this sort order (even if it is implemented as
* a nestloop, and even if some of the mergeclauses are implemented by
@@ -1936,21 +1895,7 @@ match_unsorted_outer(PlannerInfo *root,
merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
outerpath->pathkeys);
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /*
- * Consider nestloop join, but only with the unique-ified cheapest
- * inner path
- */
- try_nestloop_path(root,
- joinrel,
- outerpath,
- inner_cheapest_total,
- merge_pathkeys,
- jointype,
- extra);
- }
- else if (nestjoinOK)
+ if (nestjoinOK)
{
/*
* Consider nestloop joins using this outer path and various
@@ -2001,17 +1946,13 @@ match_unsorted_outer(PlannerInfo *root,
extra);
}
- /* Can't do anything else if outer path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- continue;
-
/* Can't do anything else if inner rel is parameterized by outer */
if (inner_cheapest_total == NULL)
continue;
/* Generate merge join paths */
generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
- save_jointype, extra, useallclauses,
+ jointype, extra, useallclauses,
inner_cheapest_total, merge_pathkeys,
false);
}
@@ -2019,41 +1960,35 @@ match_unsorted_outer(PlannerInfo *root,
/*
* Consider partial nestloop and mergejoin plan if outerrel has any
* partial path and the joinrel is parallel-safe. However, we can't
- * handle JOIN_UNIQUE_OUTER, because the outer path will be partial, and
- * therefore we won't be able to properly guarantee uniqueness. Nor can
- * we handle joins needing lateral rels, since partial paths must not be
- * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * handle joins needing lateral rels, since partial paths must not be
+ * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
* JOIN_RIGHT_ANTI, because they can produce false null extended rows.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
if (nestjoinOK)
consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
- save_jointype, extra);
+ jointype, extra);
/*
* If inner_cheapest_total is NULL or non parallel-safe then find the
- * cheapest total parallel safe path. If doing JOIN_UNIQUE_INNER, we
- * can't use any alternative inner path.
+ * cheapest total parallel safe path.
*/
if (inner_cheapest_total == NULL ||
!inner_cheapest_total->parallel_safe)
{
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
- inner_cheapest_total = get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
+ inner_cheapest_total =
+ get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
if (inner_cheapest_total)
consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
- save_jointype, extra,
+ jointype, extra,
inner_cheapest_total);
}
}
@@ -2118,24 +2053,17 @@ consider_parallel_nestloop(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
Path *matpath = NULL;
ListCell *lc1;
- if (jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/*
- * Consider materializing the cheapest inner path, unless: 1) we're doing
- * JOIN_UNIQUE_INNER, because in this case we have to unique-ify the
- * cheapest inner path, 2) enable_material is off, 3) the cheapest inner
- * path is not parallel-safe, 4) the cheapest inner path is parameterized
- * by the outer rel, or 5) the cheapest inner path materializes its output
- * anyway.
+ * Consider materializing the cheapest inner path, unless: 1)
+ * enable_material is off, 2) the cheapest inner path is not
+ * parallel-safe, 3) the cheapest inner path is parameterized by the outer
+ * rel, or 4) the cheapest inner path materializes its output anyway.
*/
- if (save_jointype != JOIN_UNIQUE_INNER &&
- enable_material && inner_cheapest_total->parallel_safe &&
+ if (enable_material && inner_cheapest_total->parallel_safe &&
!PATH_PARAM_BY_REL(inner_cheapest_total, outerrel) &&
!ExecMaterializesOutput(inner_cheapest_total->pathtype))
{
@@ -2169,23 +2097,6 @@ consider_parallel_nestloop(PlannerInfo *root,
if (!innerpath->parallel_safe)
continue;
- /*
- * If we're doing JOIN_UNIQUE_INNER, we can only use the inner's
- * cheapest_total_path, and we have to unique-ify it. (We might
- * be able to relax this to allow other safe, unparameterized
- * inner paths, but right now create_unique_path is not on board
- * with that.)
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- if (innerpath != innerrel->cheapest_total_path)
- continue;
- innerpath = (Path *) create_unique_path(root, innerrel,
- innerpath,
- extra->sjinfo);
- Assert(innerpath);
- }
-
try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
pathkeys, jointype, extra);
@@ -2227,7 +2138,6 @@ hash_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool isouterjoin = IS_OUTER_JOIN(jointype);
List *hashclauses;
ListCell *l;
@@ -2290,6 +2200,8 @@ hash_inner_and_outer(PlannerInfo *root,
Path *cheapest_startup_outer = outerrel->cheapest_startup_path;
Path *cheapest_total_outer = outerrel->cheapest_total_path;
Path *cheapest_total_inner = innerrel->cheapest_total_path;
+ ListCell *lc1;
+ ListCell *lc2;
/*
* If either cheapest-total path is parameterized by the other rel, we
@@ -2301,114 +2213,64 @@ hash_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
return;
- /* Unique-ify if need be; we ignore parameterized possibilities */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- cheapest_total_outer = (Path *)
- create_unique_path(root, outerrel,
- cheapest_total_outer, extra->sjinfo);
- Assert(cheapest_total_outer);
- jointype = JOIN_INNER;
- try_hashjoin_path(root,
- joinrel,
- cheapest_total_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- /* no possibility of cheap startup here */
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- cheapest_total_inner = (Path *)
- create_unique_path(root, innerrel,
- cheapest_total_inner, extra->sjinfo);
- Assert(cheapest_total_inner);
- jointype = JOIN_INNER;
+ /*
+ * Consider the cheapest startup outer together with the cheapest
+ * total inner, and then consider pairings of cheapest-total paths
+ * including parameterized ones. There is no use in generating
+ * parameterized paths on the basis of possibly cheap startup cost, so
+ * this is sufficient.
+ */
+ if (cheapest_startup_outer != NULL)
try_hashjoin_path(root,
joinrel,
- cheapest_total_outer,
+ cheapest_startup_outer,
cheapest_total_inner,
hashclauses,
jointype,
extra);
- if (cheapest_startup_outer != NULL &&
- cheapest_startup_outer != cheapest_total_outer)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- }
- else
+
+ foreach(lc1, outerrel->cheapest_parameterized_paths)
{
+ Path *outerpath = (Path *) lfirst(lc1);
+
/*
- * For other jointypes, we consider the cheapest startup outer
- * together with the cheapest total inner, and then consider
- * pairings of cheapest-total paths including parameterized ones.
- * There is no use in generating parameterized paths on the basis
- * of possibly cheap startup cost, so this is sufficient.
+ * We cannot use an outer path that is parameterized by the inner
+ * rel.
*/
- ListCell *lc1;
- ListCell *lc2;
-
- if (cheapest_startup_outer != NULL)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
+ if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ continue;
- foreach(lc1, outerrel->cheapest_parameterized_paths)
+ foreach(lc2, innerrel->cheapest_parameterized_paths)
{
- Path *outerpath = (Path *) lfirst(lc1);
+ Path *innerpath = (Path *) lfirst(lc2);
/*
- * We cannot use an outer path that is parameterized by the
- * inner rel.
+ * We cannot use an inner path that is parameterized by the
+ * outer rel, either.
*/
- if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ if (PATH_PARAM_BY_REL(innerpath, outerrel))
continue;
- foreach(lc2, innerrel->cheapest_parameterized_paths)
- {
- Path *innerpath = (Path *) lfirst(lc2);
-
- /*
- * We cannot use an inner path that is parameterized by
- * the outer rel, either.
- */
- if (PATH_PARAM_BY_REL(innerpath, outerrel))
- continue;
+ if (outerpath == cheapest_startup_outer &&
+ innerpath == cheapest_total_inner)
+ continue; /* already tried it */
- if (outerpath == cheapest_startup_outer &&
- innerpath == cheapest_total_inner)
- continue; /* already tried it */
-
- try_hashjoin_path(root,
- joinrel,
- outerpath,
- innerpath,
- hashclauses,
- jointype,
- extra);
- }
+ try_hashjoin_path(root,
+ joinrel,
+ outerpath,
+ innerpath,
+ hashclauses,
+ jointype,
+ extra);
}
}
/*
* If the joinrel is parallel-safe, we may be able to consider a
- * partial hash join. However, we can't handle JOIN_UNIQUE_OUTER,
- * because the outer path will be partial, and therefore we won't be
- * able to properly guarantee uniqueness. Also, the resulting path
- * must not be parameterized.
+ * partial hash join. However, the resulting path must not be
+ * parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -2421,11 +2283,9 @@ hash_inner_and_outer(PlannerInfo *root,
/*
* Can we use a partial inner plan too, so that we can build a
- * shared hash table in parallel? We can't handle
- * JOIN_UNIQUE_INNER because we can't guarantee uniqueness.
+ * shared hash table in parallel?
*/
if (innerrel->partial_pathlist != NIL &&
- save_jointype != JOIN_UNIQUE_INNER &&
enable_parallel_hash)
{
cheapest_partial_inner =
@@ -2441,19 +2301,18 @@ hash_inner_and_outer(PlannerInfo *root,
* Normally, given that the joinrel is parallel-safe, the cheapest
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
- * path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, right-semi or right-anti join, we
- * can't use parallelism (building the hash table in each backend)
+ * path. If full, right, right-semi or right-anti join, we can't
+ * use parallelism (building the hash table in each backend)
* because no one process has all the match bits.
*/
- if (save_jointype == JOIN_FULL ||
- save_jointype == JOIN_RIGHT ||
- save_jointype == JOIN_RIGHT_SEMI ||
- save_jointype == JOIN_RIGHT_ANTI)
+ if (jointype == JOIN_FULL ||
+ jointype == JOIN_RIGHT ||
+ jointype == JOIN_RIGHT_SEMI ||
+ jointype == JOIN_RIGHT_ANTI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index aad41b94009..535248aa525 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -19,6 +19,7 @@
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
+#include "optimizer/planner.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
@@ -444,8 +445,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel2, sjinfo) != NULL)
{
/*----------
* For a semijoin, we can join the RHS to anything else by
@@ -477,8 +477,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel1->relids) &&
- create_unique_path(root, rel1, rel1->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel1, sjinfo) != NULL)
{
/* Reversed semijoin case */
if (match_sjinfo)
@@ -886,6 +885,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist)
{
+ RelOptInfo *unique_rel2;
+
/*
* Consider paths using each rel as both outer and inner. Depending on
* the join type, a provably empty outer or inner rel might mean the join
@@ -991,14 +992,13 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
/*
* If we know how to unique-ify the RHS and one input rel is
* exactly the RHS (not a superset) we can consider unique-ifying
- * it and then doing a regular join. (The create_unique_path
+ * it and then doing a regular join. (The create_unique_paths
* check here is probably redundant with what join_is_legal did,
* but if so the check is cheap because it's cached. So test
* anyway to be sure.)
*/
if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ (unique_rel2 = create_unique_paths(root, rel2, sjinfo)) != NULL)
{
if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
restriction_is_constant_false(restrictlist, joinrel, false))
@@ -1006,10 +1006,10 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
mark_dummy_rel(joinrel);
break;
}
- add_paths_to_joinrel(root, joinrel, rel1, rel2,
+ add_paths_to_joinrel(root, joinrel, rel1, unique_rel2,
JOIN_UNIQUE_INNER, sjinfo,
restrictlist);
- add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ add_paths_to_joinrel(root, joinrel, unique_rel2, rel1,
JOIN_UNIQUE_OUTER, sjinfo,
restrictlist);
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 0b61aef962c..6752e1bd902 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -95,8 +95,6 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
int flags);
static Memoize *create_memoize_plan(PlannerInfo *root, MemoizePath *best_path,
int flags);
-static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
- int flags);
static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
static Plan *create_projection_plan(PlannerInfo *root,
ProjectionPath *best_path,
@@ -106,8 +104,7 @@ static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
static IncrementalSort *create_incrementalsort_plan(PlannerInfo *root,
IncrementalSortPath *best_path, int flags);
static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
-static Unique *create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path,
- int flags);
+static Unique *create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags);
static Agg *create_agg_plan(PlannerInfo *root, AggPath *best_path);
static Plan *create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path);
static Result *create_minmaxagg_plan(PlannerInfo *root, MinMaxAggPath *best_path);
@@ -293,9 +290,9 @@ static WindowAgg *make_windowagg(List *tlist, WindowClause *wc,
static Group *make_group(List *tlist, List *qual, int numGroupCols,
AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations,
Plan *lefttree);
-static Unique *make_unique_from_sortclauses(Plan *lefttree, List *distinctList);
static Unique *make_unique_from_pathkeys(Plan *lefttree,
- List *pathkeys, int numCols);
+ List *pathkeys, int numCols,
+ Relids relids);
static Gather *make_gather(List *qptlist, List *qpqual,
int nworkers, int rescan_param, bool single_copy, Plan *subplan);
static SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy,
@@ -467,19 +464,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
flags);
break;
case T_Unique:
- if (IsA(best_path, UpperUniquePath))
- {
- plan = (Plan *) create_upper_unique_plan(root,
- (UpperUniquePath *) best_path,
- flags);
- }
- else
- {
- Assert(IsA(best_path, UniquePath));
- plan = create_unique_plan(root,
- (UniquePath *) best_path,
- flags);
- }
+ plan = (Plan *) create_unique_plan(root,
+ (UniquePath *) best_path,
+ flags);
break;
case T_Gather:
plan = (Plan *) create_gather_plan(root,
@@ -1710,207 +1697,6 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags)
return plan;
}
-/*
- * create_unique_plan
- * Create a Unique plan for 'best_path' and (recursively) plans
- * for its subpaths.
- *
- * Returns a Plan node.
- */
-static Plan *
-create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
-{
- Plan *plan;
- Plan *subplan;
- List *in_operators;
- List *uniq_exprs;
- List *newtlist;
- int nextresno;
- bool newitems;
- int numGroupCols;
- AttrNumber *groupColIdx;
- Oid *groupCollations;
- int groupColPos;
- ListCell *l;
-
- /* Unique doesn't project, so tlist requirements pass through */
- subplan = create_plan_recurse(root, best_path->subpath, flags);
-
- /* Done if we don't need to do any actual unique-ifying */
- if (best_path->umethod == UNIQUE_PATH_NOOP)
- return subplan;
-
- /*
- * As constructed, the subplan has a "flat" tlist containing just the Vars
- * needed here and at upper levels. The values we are supposed to
- * unique-ify may be expressions in these variables. We have to add any
- * such expressions to the subplan's tlist.
- *
- * The subplan may have a "physical" tlist if it is a simple scan plan. If
- * we're going to sort, this should be reduced to the regular tlist, so
- * that we don't sort more data than we need to. For hashing, the tlist
- * should be left as-is if we don't need to add any expressions; but if we
- * do have to add expressions, then a projection step will be needed at
- * runtime anyway, so we may as well remove unneeded items. Therefore
- * newtlist starts from build_path_tlist() not just a copy of the
- * subplan's tlist; and we don't install it into the subplan unless we are
- * sorting or stuff has to be added.
- */
- in_operators = best_path->in_operators;
- uniq_exprs = best_path->uniq_exprs;
-
- /* initialize modified subplan tlist as just the "required" vars */
- newtlist = build_path_tlist(root, &best_path->path);
- nextresno = list_length(newtlist) + 1;
- newitems = false;
-
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle)
- {
- tle = makeTargetEntry((Expr *) uniqexpr,
- nextresno,
- NULL,
- false);
- newtlist = lappend(newtlist, tle);
- nextresno++;
- newitems = true;
- }
- }
-
- /* Use change_plan_targetlist in case we need to insert a Result node */
- if (newitems || best_path->umethod == UNIQUE_PATH_SORT)
- subplan = change_plan_targetlist(subplan, newtlist,
- best_path->path.parallel_safe);
-
- /*
- * Build control information showing which subplan output columns are to
- * be examined by the grouping step. Unfortunately we can't merge this
- * with the previous loop, since we didn't then know which version of the
- * subplan tlist we'd end up using.
- */
- newtlist = subplan->targetlist;
- numGroupCols = list_length(uniq_exprs);
- groupColIdx = (AttrNumber *) palloc(numGroupCols * sizeof(AttrNumber));
- groupCollations = (Oid *) palloc(numGroupCols * sizeof(Oid));
-
- groupColPos = 0;
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle) /* shouldn't happen */
- elog(ERROR, "failed to find unique expression in subplan tlist");
- groupColIdx[groupColPos] = tle->resno;
- groupCollations[groupColPos] = exprCollation((Node *) tle->expr);
- groupColPos++;
- }
-
- if (best_path->umethod == UNIQUE_PATH_HASH)
- {
- Oid *groupOperators;
-
- /*
- * Get the hashable equality operators for the Agg node to use.
- * Normally these are the same as the IN clause operators, but if
- * those are cross-type operators then the equality operators are the
- * ones for the IN clause operators' RHS datatype.
- */
- groupOperators = (Oid *) palloc(numGroupCols * sizeof(Oid));
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid eq_oper;
-
- if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
- elog(ERROR, "could not find compatible hash operator for operator %u",
- in_oper);
- groupOperators[groupColPos++] = eq_oper;
- }
-
- /*
- * Since the Agg node is going to project anyway, we can give it the
- * minimum output tlist, without any stuff we might have added to the
- * subplan tlist.
- */
- plan = (Plan *) make_agg(build_path_tlist(root, &best_path->path),
- NIL,
- AGG_HASHED,
- AGGSPLIT_SIMPLE,
- numGroupCols,
- groupColIdx,
- groupOperators,
- groupCollations,
- NIL,
- NIL,
- best_path->path.rows,
- 0,
- subplan);
- }
- else
- {
- List *sortList = NIL;
- Sort *sort;
-
- /* Create an ORDER BY list to sort the input compatibly */
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid sortop;
- Oid eqop;
- TargetEntry *tle;
- SortGroupClause *sortcl;
-
- sortop = get_ordering_op_for_equality_op(in_oper, false);
- if (!OidIsValid(sortop)) /* shouldn't happen */
- elog(ERROR, "could not find ordering operator for equality operator %u",
- in_oper);
-
- /*
- * The Unique node will need equality operators. Normally these
- * are the same as the IN clause operators, but if those are
- * cross-type operators then the equality operators are the ones
- * for the IN clause operators' RHS datatype.
- */
- eqop = get_equality_op_for_ordering_op(sortop, NULL);
- if (!OidIsValid(eqop)) /* shouldn't happen */
- elog(ERROR, "could not find equality operator for ordering operator %u",
- sortop);
-
- tle = get_tle_by_resno(subplan->targetlist,
- groupColIdx[groupColPos]);
- Assert(tle != NULL);
-
- sortcl = makeNode(SortGroupClause);
- sortcl->tleSortGroupRef = assignSortGroupRef(tle,
- subplan->targetlist);
- sortcl->eqop = eqop;
- sortcl->sortop = sortop;
- sortcl->reverse_sort = false;
- sortcl->nulls_first = false;
- sortcl->hashable = false; /* no need to make this accurate */
- sortList = lappend(sortList, sortcl);
- groupColPos++;
- }
- sort = make_sort_from_sortclauses(sortList, subplan);
- label_sort_with_costsize(root, sort, -1.0);
- plan = (Plan *) make_unique_from_sortclauses((Plan *) sort, sortList);
- }
-
- /* Copy cost data from Path to Plan */
- copy_generic_path_info(plan, &best_path->path);
-
- return plan;
-}
-
/*
* create_gather_plan
*
@@ -2268,13 +2054,13 @@ create_group_plan(PlannerInfo *root, GroupPath *best_path)
}
/*
- * create_upper_unique_plan
+ * create_unique_plan
*
* Create a Unique plan for 'best_path' and (recursively) plans
* for its subpaths.
*/
static Unique *
-create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flags)
+create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
{
Unique *plan;
Plan *subplan;
@@ -2288,7 +2074,8 @@ create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flag
plan = make_unique_from_pathkeys(subplan,
best_path->path.pathkeys,
- best_path->numkeys);
+ best_path->numkeys,
+ best_path->path.parent->relids);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -6821,61 +6608,12 @@ make_group(List *tlist,
}
/*
- * distinctList is a list of SortGroupClauses, identifying the targetlist items
- * that should be considered by the Unique filter. The input path must
- * already be sorted accordingly.
- */
-static Unique *
-make_unique_from_sortclauses(Plan *lefttree, List *distinctList)
-{
- Unique *node = makeNode(Unique);
- Plan *plan = &node->plan;
- int numCols = list_length(distinctList);
- int keyno = 0;
- AttrNumber *uniqColIdx;
- Oid *uniqOperators;
- Oid *uniqCollations;
- ListCell *slitem;
-
- plan->targetlist = lefttree->targetlist;
- plan->qual = NIL;
- plan->lefttree = lefttree;
- plan->righttree = NULL;
-
- /*
- * convert SortGroupClause list into arrays of attr indexes and equality
- * operators, as wanted by executor
- */
- Assert(numCols > 0);
- uniqColIdx = (AttrNumber *) palloc(sizeof(AttrNumber) * numCols);
- uniqOperators = (Oid *) palloc(sizeof(Oid) * numCols);
- uniqCollations = (Oid *) palloc(sizeof(Oid) * numCols);
-
- foreach(slitem, distinctList)
- {
- SortGroupClause *sortcl = (SortGroupClause *) lfirst(slitem);
- TargetEntry *tle = get_sortgroupclause_tle(sortcl, plan->targetlist);
-
- uniqColIdx[keyno] = tle->resno;
- uniqOperators[keyno] = sortcl->eqop;
- uniqCollations[keyno] = exprCollation((Node *) tle->expr);
- Assert(OidIsValid(uniqOperators[keyno]));
- keyno++;
- }
-
- node->numCols = numCols;
- node->uniqColIdx = uniqColIdx;
- node->uniqOperators = uniqOperators;
- node->uniqCollations = uniqCollations;
-
- return node;
-}
-
-/*
- * as above, but use pathkeys to identify the sort columns and semantics
+ * pathkeys is a list of PathKeys, identifying the sort columns and semantics.
+ * The input path must already be sorted accordingly.
*/
static Unique *
-make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
+make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols,
+ Relids relids)
{
Unique *node = makeNode(Unique);
Plan *plan = &node->plan;
@@ -6938,7 +6676,7 @@ make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
foreach(j, plan->targetlist)
{
tle = (TargetEntry *) lfirst(j);
- em = find_ec_member_matching_expr(ec, tle->expr, NULL);
+ em = find_ec_member_matching_expr(ec, tle->expr, relids);
if (em)
{
/* found expr already in tlist */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 549aedcfa99..cb5a9debfc0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -267,6 +267,12 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
+static void create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
+static void create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
/*****************************************************************************
@@ -4917,10 +4923,10 @@ create_partial_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_partial_path(partial_distinct_rel, (Path *)
- create_upper_unique_path(root, partial_distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, partial_distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -5111,10 +5117,10 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_path(distinct_rel, (Path *)
- create_upper_unique_path(root, distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -8248,3 +8254,503 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
return grouplist;
}
+
+/*
+ * create_unique_paths
+ * Build a new RelOptInfo containing Paths that represent elimination of
+ * distinct rows from the input data. Distinct-ness is defined according to
+ * the needs of the semijoin represented by sjinfo. If it is not possible
+ * to identify how to make the data unique, NULL is returned.
+ *
+ * If used at all, this is likely to be called repeatedly on the same rel;
+ * So we cache the result.
+ */
+RelOptInfo *
+create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *unique_rel;
+ List *sortPathkeys = NIL;
+ List *groupClause = NIL;
+ MemoryContext oldcontext;
+
+ /* Caller made a mistake if SpecialJoinInfo is the wrong one */
+ Assert(sjinfo->jointype == JOIN_SEMI);
+ Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
+
+ /* If result already cached, return it */
+ if (rel->unique_rel)
+ return rel->unique_rel;
+
+ /* If it's not possible to unique-ify, return NULL */
+ if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
+ return NULL;
+
+ /*
+ * When called during GEQO join planning, we are in a short-lived memory
+ * context. We must make sure that the unique rel and any subsidiary data
+ * structures created for a baserel survive the GEQO cycle, else the
+ * baserel is trashed for future GEQO cycles. On the other hand, when we
+ * are creating those for a joinrel during GEQO, we don't want them to
+ * clutter the main planning context. Upshot is that the best solution is
+ * to explicitly allocate memory in the same context the given RelOptInfo
+ * is in.
+ */
+ oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
+
+ unique_rel = makeNode(RelOptInfo);
+ memcpy(unique_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ unique_rel->pathlist = NIL;
+ unique_rel->ppilist = NIL;
+ unique_rel->partial_pathlist = NIL;
+ unique_rel->cheapest_startup_path = NULL;
+ unique_rel->cheapest_total_path = NULL;
+ unique_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * Build the target list for the unique rel. We also build the pathkeys
+ * that represent the ordering requirements for the sort-based
+ * implementation, and the list of SortGroupClause nodes that represent
+ * the columns to be grouped on for the hash-based implementation.
+ *
+ * For a child rel, we can construct these fields from those of its
+ * parent.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ PathTarget *child_unique_target;
+ PathTarget *parent_unique_target;
+
+ parent_unique_target = rel->top_parent->unique_rel->reltarget;
+
+ child_unique_target = copy_pathtarget(parent_unique_target);
+
+ /* Translate the target expressions */
+ child_unique_target->exprs = (List *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) parent_unique_target->exprs,
+ rel,
+ rel->top_parent);
+
+ unique_rel->reltarget = child_unique_target;
+
+ sortPathkeys = rel->top_parent->unique_pathkeys;
+ groupClause = rel->top_parent->unique_groupclause;
+ }
+ else
+ {
+ List *newtlist;
+ int nextresno;
+ List *sortList = NIL;
+ ListCell *lc1;
+ ListCell *lc2;
+
+ /*
+ * The values we are supposed to unique-ify may be expressions in the
+ * variables of the input rel's targetlist. We have to add any such
+ * expressions to the unique rel's targetlist.
+ *
+ * While in the loop, build the lists of SortGroupClause's that
+ * represent the ordering for the sort-based implementation and the
+ * grouping for the hash-based implementation.
+ */
+ newtlist = make_tlist_from_pathtarget(rel->reltarget);
+ nextresno = list_length(newtlist) + 1;
+
+ forboth(lc1, sjinfo->semi_rhs_exprs, lc2, sjinfo->semi_operators)
+ {
+ Expr *uniqexpr = lfirst(lc1);
+ Oid in_oper = lfirst_oid(lc2);
+ Oid sortop = InvalidOid;
+ TargetEntry *tle;
+
+ tle = tlist_member(uniqexpr, newtlist);
+ if (!tle)
+ {
+ tle = makeTargetEntry((Expr *) uniqexpr,
+ nextresno,
+ NULL,
+ false);
+ newtlist = lappend(newtlist, tle);
+ nextresno++;
+ }
+
+ if (sjinfo->semi_can_btree)
+ {
+ /* Create an ORDER BY list to sort the input compatibly */
+ Oid eqop;
+ SortGroupClause *sortcl;
+
+ sortop = get_ordering_op_for_equality_op(in_oper, false);
+ if (!OidIsValid(sortop)) /* shouldn't happen */
+ elog(ERROR, "could not find ordering operator for equality operator %u",
+ in_oper);
+
+ /*
+ * The Unique node will need equality operators. Normally
+ * these are the same as the IN clause operators, but if those
+ * are cross-type operators then the equality operators are
+ * the ones for the IN clause operators' RHS datatype.
+ */
+ eqop = get_equality_op_for_ordering_op(sortop, NULL);
+ if (!OidIsValid(eqop)) /* shouldn't happen */
+ elog(ERROR, "could not find equality operator for ordering operator %u",
+ sortop);
+
+ sortcl = makeNode(SortGroupClause);
+ sortcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ sortcl->eqop = eqop;
+ sortcl->sortop = sortop;
+ sortcl->reverse_sort = false;
+ sortcl->nulls_first = false;
+ sortcl->hashable = false; /* no need to make this accurate */
+ sortList = lappend(sortList, sortcl);
+ }
+ if (sjinfo->semi_can_hash)
+ {
+ /* Create a GROUP BY list for the Agg node to use */
+ Oid eq_oper;
+ SortGroupClause *groupcl;
+
+ /*
+ * Get the hashable equality operators for the Agg node to
+ * use. Normally these are the same as the IN clause
+ * operators, but if those are cross-type operators then the
+ * equality operators are the ones for the IN clause
+ * operators' RHS datatype.
+ */
+ if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
+ elog(ERROR, "could not find compatible hash operator for operator %u",
+ in_oper);
+
+ groupcl = makeNode(SortGroupClause);
+ groupcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ groupcl->eqop = eq_oper;
+ groupcl->sortop = sortop;
+ groupcl->reverse_sort = false;
+ groupcl->nulls_first = false;
+ groupcl->hashable = true;
+ groupClause = lappend(groupClause, groupcl);
+ }
+ }
+
+ unique_rel->reltarget = create_pathtarget(root, newtlist);
+ sortPathkeys = make_pathkeys_for_sortclauses(root, sortList, newtlist);
+ }
+
+ /* build unique paths based on input rel's pathlist */
+ create_final_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* build unique paths based on input rel's partial_pathlist */
+ create_partial_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* Now choose the best path(s) */
+ set_cheapest(unique_rel);
+
+ /*
+ * There shouldn't be any partial paths for the unique relation;
+ * otherwise, we won't be able to properly guarantee uniqueness.
+ */
+ Assert(unique_rel->partial_pathlist == NIL);
+
+ /* Cache the result */
+ rel->unique_rel = unique_rel;
+ rel->unique_pathkeys = sortPathkeys;
+ rel->unique_groupclause = groupClause;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return unique_rel;
+}
+
+/*
+ * create_final_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' pathlist
+ */
+static void
+create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ Path *cheapest_input_path = input_rel->cheapest_total_path;
+
+ /* Estimate number of output rows */
+ unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ cheapest_input_path->rows,
+ NULL,
+ NULL);
+
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, input_rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_input_path)
+ continue;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ input_path,
+ unique_rel->reltarget);
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_input_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, unique_rel, path,
+ list_length(sortPathkeys),
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ cheapest_input_path,
+ unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ unique_rel,
+ path,
+ unique_rel->reltarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+
+ }
+}
+
+/*
+ * create_partial_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' partial_pathlist
+ */
+static void
+create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ RelOptInfo *partial_unique_rel;
+ Path *cheapest_partial_path;
+
+ /* nothing to do when there are no partial paths in the input rel */
+ if (!input_rel->consider_parallel || input_rel->partial_pathlist == NIL)
+ return;
+
+ /*
+ * nothing to do if there's anything in the targetlist that's
+ * parallel-restricted.
+ */
+ if (!is_parallel_safe(root, (Node *) unique_rel->reltarget->exprs))
+ return;
+
+ cheapest_partial_path = linitial(input_rel->partial_pathlist);
+
+ partial_unique_rel = makeNode(RelOptInfo);
+ memcpy(partial_unique_rel, input_rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ partial_unique_rel->pathlist = NIL;
+ partial_unique_rel->ppilist = NIL;
+ partial_unique_rel->partial_pathlist = NIL;
+ partial_unique_rel->cheapest_startup_path = NULL;
+ partial_unique_rel->cheapest_total_path = NULL;
+ partial_unique_rel->cheapest_parameterized_paths = NIL;
+
+ /* Estimate number of output rows */
+ partial_unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ cheapest_partial_path->rows,
+ NULL,
+ NULL);
+ partial_unique_rel->reltarget = unique_rel->reltarget;
+
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest partial path and incremental sort on any paths
+ * with presorted keys.
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ input_path,
+ partial_unique_rel->reltarget);
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, partial_unique_rel, path,
+ list_length(sortPathkeys),
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ cheapest_partial_path,
+ partial_unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ partial_unique_rel,
+ path,
+ partial_unique_rel->reltarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+
+ if (partial_unique_rel->partial_pathlist != NIL)
+ {
+ generate_useful_gather_paths(root, partial_unique_rel, true);
+ set_cheapest(partial_unique_rel);
+
+ /*
+ * Finally, create paths to unique-ify the final result. This step is
+ * needed to remove any duplicates due to combining rows from parallel
+ * workers.
+ */
+ create_final_unique_paths(root, partial_unique_rel,
+ sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+ }
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index eab44da65b8..28a4ae64440 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -929,11 +929,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
@@ -946,11 +946,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
}
}
@@ -970,11 +970,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
NULL);
/* and make the MergeAppend unique */
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(tlist),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(tlist),
+ dNumGroups);
add_path(result_rel, path);
}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e0192d4a491..2ee06dc7317 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,7 +46,6 @@ typedef enum
*/
#define STD_FUZZ_FACTOR 1.01
-static List *translate_sub_tlist(List *tlist, int relid);
static int append_total_cost_compare(const ListCell *a, const ListCell *b);
static int append_startup_cost_compare(const ListCell *a, const ListCell *b);
static List *reparameterize_pathlist_by_child(PlannerInfo *root,
@@ -381,7 +380,6 @@ set_cheapest(RelOptInfo *parent_rel)
parent_rel->cheapest_startup_path = cheapest_startup_path;
parent_rel->cheapest_total_path = cheapest_total_path;
- parent_rel->cheapest_unique_path = NULL; /* computed only if needed */
parent_rel->cheapest_parameterized_paths = parameterized_paths;
}
@@ -1712,246 +1710,6 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * create_unique_path
- * Creates a path representing elimination of distinct rows from the
- * input data. Distinct-ness is defined according to the needs of the
- * semijoin represented by sjinfo. If it is not possible to identify
- * how to make the data unique, NULL is returned.
- *
- * If used at all, this is likely to be called repeatedly on the same rel;
- * and the input subpath should always be the same (the cheapest_total path
- * for the rel). So we cache the result.
- */
-UniquePath *
-create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- SpecialJoinInfo *sjinfo)
-{
- UniquePath *pathnode;
- Path sort_path; /* dummy for result of cost_sort */
- Path agg_path; /* dummy for result of cost_agg */
- MemoryContext oldcontext;
- int numCols;
-
- /* Caller made a mistake if subpath isn't cheapest_total ... */
- Assert(subpath == rel->cheapest_total_path);
- Assert(subpath->parent == rel);
- /* ... or if SpecialJoinInfo is the wrong one */
- Assert(sjinfo->jointype == JOIN_SEMI);
- Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
-
- /* If result already cached, return it */
- if (rel->cheapest_unique_path)
- return (UniquePath *) rel->cheapest_unique_path;
-
- /* If it's not possible to unique-ify, return NULL */
- if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
- return NULL;
-
- /*
- * When called during GEQO join planning, we are in a short-lived memory
- * context. We must make sure that the path and any subsidiary data
- * structures created for a baserel survive the GEQO cycle, else the
- * baserel is trashed for future GEQO cycles. On the other hand, when we
- * are creating those for a joinrel during GEQO, we don't want them to
- * clutter the main planning context. Upshot is that the best solution is
- * to explicitly allocate memory in the same context the given RelOptInfo
- * is in.
- */
- oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
-
- pathnode = makeNode(UniquePath);
-
- pathnode->path.pathtype = T_Unique;
- pathnode->path.parent = rel;
- pathnode->path.pathtarget = rel->reltarget;
- pathnode->path.param_info = subpath->param_info;
- pathnode->path.parallel_aware = false;
- pathnode->path.parallel_safe = rel->consider_parallel &&
- subpath->parallel_safe;
- pathnode->path.parallel_workers = subpath->parallel_workers;
-
- /*
- * Assume the output is unsorted, since we don't necessarily have pathkeys
- * to represent it. (This might get overridden below.)
- */
- pathnode->path.pathkeys = NIL;
-
- pathnode->subpath = subpath;
-
- /*
- * Under GEQO and when planning child joins, the sjinfo might be
- * short-lived, so we'd better make copies of data structures we extract
- * from it.
- */
- pathnode->in_operators = copyObject(sjinfo->semi_operators);
- pathnode->uniq_exprs = copyObject(sjinfo->semi_rhs_exprs);
-
- /*
- * If the input is a relation and it has a unique index that proves the
- * semi_rhs_exprs are unique, then we don't need to do anything. Note
- * that relation_has_unique_index_for automatically considers restriction
- * clauses for the rel, as well.
- */
- if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
- relation_has_unique_index_for(root, rel, NIL,
- sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
-
- /*
- * If the input is a subquery whose output must be unique already, then we
- * don't need to do anything. The test for uniqueness has to consider
- * exactly which columns we are extracting; for example "SELECT DISTINCT
- * x,y" doesn't guarantee that x alone is distinct. So we cannot check for
- * this optimization unless semi_rhs_exprs consists only of simple Vars
- * referencing subquery outputs. (Possibly we could do something with
- * expressions in the subquery outputs, too, but for now keep it simple.)
- */
- if (rel->rtekind == RTE_SUBQUERY)
- {
- RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
-
- if (query_supports_distinctness(rte->subquery))
- {
- List *sub_tlist_colnos;
-
- sub_tlist_colnos = translate_sub_tlist(sjinfo->semi_rhs_exprs,
- rel->relid);
-
- if (sub_tlist_colnos &&
- query_is_distinct_for(rte->subquery,
- sub_tlist_colnos,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
- }
- }
-
- /* Estimate number of output rows */
- pathnode->path.rows = estimate_num_groups(root,
- sjinfo->semi_rhs_exprs,
- rel->rows,
- NULL,
- NULL);
- numCols = list_length(sjinfo->semi_rhs_exprs);
-
- if (sjinfo->semi_can_btree)
- {
- /*
- * Estimate cost for sort+unique implementation
- */
- cost_sort(&sort_path, root, NIL,
- subpath->disabled_nodes,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width,
- 0.0,
- work_mem,
- -1.0);
-
- /*
- * Charge one cpu_operator_cost per comparison per input tuple. We
- * assume all columns get compared at most of the tuples. (XXX
- * probably this is an overestimate.) This should agree with
- * create_upper_unique_path.
- */
- sort_path.total_cost += cpu_operator_cost * rel->rows * numCols;
- }
-
- if (sjinfo->semi_can_hash)
- {
- /*
- * Estimate the overhead per hashtable entry at 64 bytes (same as in
- * planner.c).
- */
- int hashentrysize = subpath->pathtarget->width + 64;
-
- if (hashentrysize * pathnode->path.rows > get_hash_memory_limit())
- {
- /*
- * We should not try to hash. Hack the SpecialJoinInfo to
- * remember this, in case we come through here again.
- */
- sjinfo->semi_can_hash = false;
- }
- else
- cost_agg(&agg_path, root,
- AGG_HASHED, NULL,
- numCols, pathnode->path.rows,
- NIL,
- subpath->disabled_nodes,
- subpath->startup_cost,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width);
- }
-
- if (sjinfo->semi_can_btree && sjinfo->semi_can_hash)
- {
- if (agg_path.disabled_nodes < sort_path.disabled_nodes ||
- (agg_path.disabled_nodes == sort_path.disabled_nodes &&
- agg_path.total_cost < sort_path.total_cost))
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- pathnode->umethod = UNIQUE_PATH_SORT;
- }
- else if (sjinfo->semi_can_btree)
- pathnode->umethod = UNIQUE_PATH_SORT;
- else if (sjinfo->semi_can_hash)
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- {
- /* we can get here only if we abandoned hashing above */
- MemoryContextSwitchTo(oldcontext);
- return NULL;
- }
-
- if (pathnode->umethod == UNIQUE_PATH_HASH)
- {
- pathnode->path.disabled_nodes = agg_path.disabled_nodes;
- pathnode->path.startup_cost = agg_path.startup_cost;
- pathnode->path.total_cost = agg_path.total_cost;
- }
- else
- {
- pathnode->path.disabled_nodes = sort_path.disabled_nodes;
- pathnode->path.startup_cost = sort_path.startup_cost;
- pathnode->path.total_cost = sort_path.total_cost;
- }
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
-}
-
/*
* create_gather_merge_path
*
@@ -2003,36 +1761,6 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * translate_sub_tlist - get subquery column numbers represented by tlist
- *
- * The given targetlist usually contains only Vars referencing the given relid.
- * Extract their varattnos (ie, the column numbers of the subquery) and return
- * as an integer List.
- *
- * If any of the tlist items is not a simple Var, we cannot determine whether
- * the subquery's uniqueness condition (if any) matches ours, so punt and
- * return NIL.
- */
-static List *
-translate_sub_tlist(List *tlist, int relid)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, tlist)
- {
- Var *var = (Var *) lfirst(l);
-
- if (!var || !IsA(var, Var) ||
- var->varno != relid)
- return NIL; /* punt */
-
- result = lappend_int(result, var->varattno);
- }
- return result;
-}
-
/*
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
@@ -2790,8 +2518,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3046,8 +2773,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3094,8 +2820,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3171,13 +2896,10 @@ create_group_path(PlannerInfo *root,
}
/*
- * create_upper_unique_path
+ * create_unique_path
* Creates a pathnode that represents performing an explicit Unique step
* on presorted input.
*
- * This produces a Unique plan node, but the use-case is so different from
- * create_unique_path that it doesn't seem worth trying to merge the two.
- *
* 'rel' is the parent relation associated with the result
* 'subpath' is the path representing the source of data
* 'numCols' is the number of grouping columns
@@ -3186,21 +2908,20 @@ create_group_path(PlannerInfo *root,
* The input path must be sorted on the grouping columns, plus possibly
* additional columns; so the first numCols pathkeys are the grouping columns
*/
-UpperUniquePath *
-create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups)
+UniquePath *
+create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups)
{
- UpperUniquePath *pathnode = makeNode(UpperUniquePath);
+ UniquePath *pathnode = makeNode(UniquePath);
pathnode->path.pathtype = T_Unique;
pathnode->path.parent = rel;
/* Unique doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3256,8 +2977,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..0e523d2eb5b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -217,7 +217,6 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->partial_pathlist = NIL;
rel->cheapest_startup_path = NULL;
rel->cheapest_total_path = NULL;
- rel->cheapest_unique_path = NULL;
rel->cheapest_parameterized_paths = NIL;
rel->relid = relid;
rel->rtekind = rte->rtekind;
@@ -269,6 +268,9 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->fdw_private = NULL;
rel->unique_for_rels = NIL;
rel->non_unique_for_rels = NIL;
+ rel->unique_rel = NULL;
+ rel->unique_pathkeys = NIL;
+ rel->unique_groupclause = NIL;
rel->baserestrictinfo = NIL;
rel->baserestrictcost.startup = 0;
rel->baserestrictcost.per_tuple = 0;
@@ -713,7 +715,6 @@ build_join_rel(PlannerInfo *root,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
/* init direct_lateral_relids from children; we'll finish it up below */
joinrel->direct_lateral_relids =
@@ -748,6 +749,9 @@ build_join_rel(PlannerInfo *root,
joinrel->fdw_private = NULL;
joinrel->unique_for_rels = NIL;
joinrel->non_unique_for_rels = NIL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -906,7 +910,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
joinrel->direct_lateral_relids = NULL;
joinrel->lateral_relids = NULL;
@@ -933,6 +936,9 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->useridiscurrent = false;
joinrel->fdwroutine = NULL;
joinrel->fdw_private = NULL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -1488,7 +1494,6 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->pathlist = NIL;
upperrel->cheapest_startup_path = NULL;
upperrel->cheapest_total_path = NULL;
- upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fbe333d88fa..e97566b5938 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -319,8 +319,8 @@ typedef enum JoinType
* These codes are used internally in the planner, but are not supported
* by the executor (nor, indeed, by most of the planner).
*/
- JOIN_UNIQUE_OUTER, /* LHS path must be made unique */
- JOIN_UNIQUE_INNER, /* RHS path must be made unique */
+ JOIN_UNIQUE_OUTER, /* LHS has be made unique */
+ JOIN_UNIQUE_INNER, /* RHS has be made unique */
/*
* We might need additional join types someday.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595d..45f0b9c8ee9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -700,8 +700,6 @@ typedef struct PartitionSchemeData *PartitionScheme;
* (regardless of ordering) among the unparameterized paths;
* or if there is no unparameterized path, the path with lowest
* total cost among the paths with minimum parameterization
- * cheapest_unique_path - for caching cheapest path to produce unique
- * (no duplicates) output from relation; NULL if not yet requested
* cheapest_parameterized_paths - best paths for their parameterizations;
* always includes cheapest_total_path, even if that's unparameterized
* direct_lateral_relids - rels this rel has direct LATERAL references to
@@ -764,6 +762,21 @@ typedef struct PartitionSchemeData *PartitionScheme;
* other rels for which we have tried and failed to prove
* this one unique
*
+ * Three fields are used to cache information about unique-ification of this
+ * relation. This is used to support semijoins where the relation appears on
+ * the RHS: the relation is first unique-ified, and then a regular join is
+ * performed:
+ *
+ * unique_rel - the unique-ified version of the relation, containing paths
+ * that produce unique (no duplicates) output from relation;
+ * NULL if not yet requested
+ * unique_pathkeys - pathkeys that represent the ordering requirements for
+ * the relation's output in sort-based unique-ification
+ * implementations
+ * unique_groupclause - a list of SortGroupClause nodes that represent the
+ * columns to be grouped on in hash-based unique-ification
+ * implementations
+ *
* The presence of the following fields depends on the restrictions
* and joins that the relation participates in:
*
@@ -924,7 +937,6 @@ typedef struct RelOptInfo
List *partial_pathlist; /* partial Paths */
struct Path *cheapest_startup_path;
struct Path *cheapest_total_path;
- struct Path *cheapest_unique_path;
List *cheapest_parameterized_paths;
/*
@@ -1002,6 +1014,16 @@ typedef struct RelOptInfo
/* known not unique for these set(s) */
List *non_unique_for_rels;
+ /*
+ * information about unique-ification of this relation
+ */
+ /* the unique-ified version of the relation */
+ struct RelOptInfo *unique_rel;
+ /* pathkeys for sort-based unique-ification implementations */
+ List *unique_pathkeys;
+ /* SortGroupClause nodes for hash-based unique-ification implementations */
+ List *unique_groupclause;
+
/*
* used by various scans and joins:
*/
@@ -1739,8 +1761,8 @@ typedef struct ParamPathInfo
* and the specified outer rel(s).
*
* "rows" is the same as parent->rows in simple paths, but in parameterized
- * paths and UniquePaths it can be less than parent->rows, reflecting the
- * fact that we've filtered by extra join conditions or removed duplicates.
+ * paths it can be less than parent->rows, reflecting the fact that we've
+ * filtered by extra join conditions.
*
* "pathkeys" is a List of PathKey nodes (see above), describing the sort
* ordering of the path's output rows.
@@ -2137,34 +2159,6 @@ typedef struct MemoizePath
* if unknown */
} MemoizePath;
-/*
- * UniquePath represents elimination of distinct rows from the output of
- * its subpath.
- *
- * This can represent significantly different plans: either hash-based or
- * sort-based implementation, or a no-op if the input path can be proven
- * distinct already. The decision is sufficiently localized that it's not
- * worth having separate Path node types. (Note: in the no-op case, we could
- * eliminate the UniquePath node entirely and just return the subpath; but
- * it's convenient to have a UniquePath in the path tree to signal upper-level
- * routines that the input is known distinct.)
- */
-typedef enum UniquePathMethod
-{
- UNIQUE_PATH_NOOP, /* input is known unique already */
- UNIQUE_PATH_HASH, /* use hashing */
- UNIQUE_PATH_SORT, /* use sorting */
-} UniquePathMethod;
-
-typedef struct UniquePath
-{
- Path path;
- Path *subpath;
- UniquePathMethod umethod;
- List *in_operators; /* equality operators of the IN clause */
- List *uniq_exprs; /* expressions to be made unique */
-} UniquePath;
-
/*
* GatherPath runs several copies of a plan in parallel and collects the
* results. The parallel leader may also execute the plan, unless the
@@ -2371,17 +2365,17 @@ typedef struct GroupPath
} GroupPath;
/*
- * UpperUniquePath represents adjacent-duplicate removal (in presorted input)
+ * UniquePath represents adjacent-duplicate removal (in presorted input)
*
* The columns to be compared are the first numkeys columns of the path's
* pathkeys. The input is presumed already sorted that way.
*/
-typedef struct UpperUniquePath
+typedef struct UniquePath
{
Path path;
Path *subpath; /* path representing input source */
int numkeys; /* number of pathkey columns to compare */
-} UpperUniquePath;
+} UniquePath;
/*
* AggPath represents generic computation of aggregate functions
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 60dcdb77e41..71d2945b175 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -91,8 +91,6 @@ extern MemoizePath *create_memoize_path(PlannerInfo *root,
bool singlerow,
bool binary_mode,
double calls);
-extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
- Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath, PathTarget *target,
Relids required_outer, double *rows);
@@ -223,11 +221,11 @@ extern GroupPath *create_group_path(PlannerInfo *root,
List *groupClause,
List *qual,
double numGroups);
-extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups);
+extern UniquePath *create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups);
extern AggPath *create_agg_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 347c582a789..f220e9a270d 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,4 +59,7 @@ extern Path *get_cheapest_fractional_path(RelOptInfo *rel,
extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
+extern RelOptInfo *create_unique_paths(PlannerInfo *root, RelOptInfo *rel,
+ SpecialJoinInfo *sjinfo);
+
#endif /* PLANNER_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 390aabfb34b..aebdf391ad9 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -9468,23 +9468,20 @@ where exists (select 1 from tenk1 t3
---------------------------------------------------------------------------------
Nested Loop
Output: t1.unique1, t2.hundred
- -> Hash Join
+ -> Merge Join
Output: t1.unique1, t3.tenthous
- Hash Cond: (t3.thousand = t1.unique1)
- -> HashAggregate
+ Merge Cond: (t3.thousand = t1.unique1)
+ -> Unique
Output: t3.thousand, t3.tenthous
- Group Key: t3.thousand, t3.tenthous
-> Index Only Scan using tenk1_thous_tenthous on public.tenk1 t3
Output: t3.thousand, t3.tenthous
- -> Hash
+ -> Index Only Scan using onek_unique1 on public.onek t1
Output: t1.unique1
- -> Index Only Scan using onek_unique1 on public.onek t1
- Output: t1.unique1
- Index Cond: (t1.unique1 < 1)
+ Index Cond: (t1.unique1 < 1)
-> Index Only Scan using tenk1_hundred on public.tenk1 t2
Output: t2.hundred
Index Cond: (t2.hundred = t3.tenthous)
-(18 rows)
+(15 rows)
-- ... unless it actually is unique
create table j3 as select unique1, tenthous from onek;
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index d5368186caa..24e06845f92 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1134,48 +1134,50 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- Join Filter: (t1_2.a = t1_5.b)
- -> HashAggregate
- Group Key: t1_5.b
+ -> Nested Loop
+ Join Filter: (t1_2.a = t1_5.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_5.b
-> Hash Join
Hash Cond: (((t2_1.a + t2_1.b) / 2) = t1_5.b)
-> Seq Scan on prt1_e_p1 t2_1
-> Hash
-> Seq Scan on prt2_p1 t1_5
Filter: (a = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
- Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_3.a = t1_6.b)
- -> HashAggregate
- Group Key: t1_6.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
+ Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_3.a = t1_6.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Join
Hash Cond: (((t2_2.a + t2_2.b) / 2) = t1_6.b)
-> Seq Scan on prt1_e_p2 t2_2
-> Hash
-> Seq Scan on prt2_p2 t1_6
Filter: (a = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
- Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_4.a = t1_7.b)
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
+ Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_4.a = t1_7.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Nested Loop
-> Seq Scan on prt2_p3 t1_7
Filter: (a = 0)
-> Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t2_3
Index Cond: (((a + b) / 2) = t1_7.b)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
- Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
- Filter: (b = 0)
-(41 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
+ Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
+ Filter: (b = 0)
+(43 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
a | b | c
@@ -1190,46 +1192,48 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_6.b
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Semi Join
Hash Cond: (t1_6.b = ((t1_9.a + t1_9.b) / 2))
-> Seq Scan on prt2_p1 t1_6
-> Hash
-> Seq Scan on prt1_e_p1 t1_9
Filter: (c = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
- Index Cond: (a = t1_6.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
+ Index Cond: (a = t1_6.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Hash Semi Join
Hash Cond: (t1_7.b = ((t1_10.a + t1_10.b) / 2))
-> Seq Scan on prt2_p2 t1_7
-> Hash
-> Seq Scan on prt1_e_p2 t1_10
Filter: (c = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
- Index Cond: (a = t1_7.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_8.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
+ Index Cond: (a = t1_7.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_8.b
-> Hash Semi Join
Hash Cond: (t1_8.b = ((t1_11.a + t1_11.b) / 2))
-> Seq Scan on prt2_p3 t1_8
-> Hash
-> Seq Scan on prt1_e_p3 t1_11
Filter: (c = 0)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
- Index Cond: (a = t1_8.b)
- Filter: (b = 0)
-(39 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
+ Index Cond: (a = t1_8.b)
+ Filter: (b = 0)
+(41 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 40d8056fcea..66732f9b866 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -707,6 +707,212 @@ select * from numeric_table
3
(4 rows)
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Index Cond: (t2.a = t3.b)
+(18 rows)
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Incremental Sort
+ Output: t1.a, t1.b, t2.a, t2.b
+ Sort Key: t1.a, t2.a
+ Presorted Key: t1.a
+ -> Merge Join
+ Output: t1.a, t1.b, t2.a, t2.b
+ Merge Cond: (t1.a = ((t3.a + 1)))
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Sort
+ Output: t2.a, t2.b, t3.a, ((t3.a + 1))
+ Sort Key: ((t3.a + 1))
+ -> Hash Join
+ Output: t2.a, t2.b, t3.a, ((t3.a + 1))
+ Hash Cond: (t2.a = ((t3.b + 1)))
+ -> Seq Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ -> Hash
+ Output: t3.a, t3.b, ((t3.a + 1)), ((t3.b + 1))
+ -> HashAggregate
+ Output: t3.a, t3.b, ((t3.a + 1)), ((t3.b + 1))
+ Group Key: (t3.a + 1), (t3.b + 1)
+ -> Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b, (t3.a + 1), (t3.b + 1)
+(24 rows)
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+set enable_indexscan to off;
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Gather Merge
+ Output: t3.a, t3.b
+ Workers Planned: 2
+ -> Sort
+ Output: t3.a, t3.b
+ Sort Key: t3.a, t3.b
+ -> HashAggregate
+ Output: t3.a, t3.b
+ Group Key: t3.a, t3.b
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Materialize
+ Output: t1.a, t1.b
+ -> Gather Merge
+ Output: t1.a, t1.b
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, t1.b
+ Sort Key: t1.a
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Bitmap Heap Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Recheck Cond: (t2.a = t3.b)
+ -> Bitmap Index Scan on semijoin_unique_tbl_a_b_idx
+ Index Cond: (t2.a = t3.b)
+(37 rows)
+
+reset enable_indexscan;
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+drop table semijoin_unique_tbl;
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+set enable_partitionwise_join to on;
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: t1.a
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t2_1.a, t2_1.b
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t3_1.a
+ -> Unique
+ Output: t3_1.a
+ -> Index Only Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t3_1
+ Output: t3_1.a
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t1_1
+ Output: t1_1.a, t1_1.b
+ Index Cond: (t1_1.a = t3_1.a)
+ -> Memoize
+ Output: t2_1.a, t2_1.b
+ Cache Key: t1_1.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t2_1
+ Output: t2_1.a, t2_1.b
+ Index Cond: (t2_1.a = t1_1.a)
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t2_2.a, t2_2.b
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t3_2.a
+ -> Unique
+ Output: t3_2.a
+ -> Index Only Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t3_2
+ Output: t3_2.a
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t1_2
+ Output: t1_2.a, t1_2.b
+ Index Cond: (t1_2.a = t3_2.a)
+ -> Memoize
+ Output: t2_2.a, t2_2.b
+ Cache Key: t1_2.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t2_2
+ Output: t2_2.a, t2_2.b
+ Index Cond: (t2_2.a = t1_2.a)
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t2_3.a, t2_3.b
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t3_3.a
+ -> Unique
+ Output: t3_3.a
+ -> Sort
+ Output: t3_3.a
+ Sort Key: t3_3.a
+ -> Seq Scan on public.unique_tbl_p3 t3_3
+ Output: t3_3.a
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t1_3
+ Output: t1_3.a, t1_3.b
+ Index Cond: (t1_3.a = t3_3.a)
+ -> Memoize
+ Output: t2_3.a, t2_3.b
+ Cache Key: t1_3.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t2_3
+ Output: t2_3.a, t2_3.b
+ Index Cond: (t2_3.a = t1_3.a)
+(59 rows)
+
+reset enable_partitionwise_join;
+drop table unique_tbl_p;
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
@@ -2672,18 +2878,17 @@ EXPLAIN (COSTS OFF)
SELECT * FROM onek
WHERE (unique1,ten) IN (VALUES (1,1), (20,0), (99,9), (17,99))
ORDER BY unique1;
- QUERY PLAN
------------------------------------------------------------------
- Sort
- Sort Key: onek.unique1
- -> Nested Loop
- -> HashAggregate
- Group Key: "*VALUES*".column1, "*VALUES*".column2
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: "*VALUES*".column1, "*VALUES*".column2
-> Values Scan on "*VALUES*"
- -> Index Scan using onek_unique1 on onek
- Index Cond: (unique1 = "*VALUES*".column1)
- Filter: ("*VALUES*".column2 = ten)
-(9 rows)
+ -> Index Scan using onek_unique1 on onek
+ Index Cond: (unique1 = "*VALUES*".column1)
+ Filter: ("*VALUES*".column2 = ten)
+(8 rows)
EXPLAIN (COSTS OFF)
SELECT * FROM onek
@@ -2858,12 +3063,10 @@ SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) ORDER BY 1);
-> Unique
-> Sort
Sort Key: "*VALUES*".column1
- -> Sort
- Sort Key: "*VALUES*".column1
- -> Values Scan on "*VALUES*"
+ -> Values Scan on "*VALUES*"
-> Index Scan using onek_unique1 on onek
Index Cond: (unique1 = "*VALUES*".column1)
-(9 rows)
+(7 rows)
EXPLAIN (COSTS OFF)
SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) LIMIT 1);
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index fec38ef85a6..a93fd222441 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -361,6 +361,73 @@ select * from float_table
select * from numeric_table
where num_col in (select float_col from float_table);
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+
+set enable_indexscan to off;
+
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+reset enable_indexscan;
+
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+
+drop table semijoin_unique_tbl;
+
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+
+set enable_partitionwise_join to on;
+
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+
+reset enable_partitionwise_join;
+
+drop table unique_tbl_p;
+
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7544e7c5073..27bb76d1ea0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3154,7 +3154,6 @@ UnicodeNormalizationForm
UnicodeNormalizationQC
Unique
UniquePath
-UniquePathMethod
UniqueRelInfo
UniqueState
UnlistenStmt
@@ -3170,7 +3169,6 @@ UpgradeTaskSlotState
UpgradeTaskStep
UploadManifestCmd
UpperRelationKind
-UpperUniquePath
UserAuth
UserContext
UserMapping
--
2.43.0
On Thu, Jul 3, 2025 at 7:06 PM Richard Guo <guofenglinux@gmail.com> wrote:
This patch does not apply again, so here is a new rebase.
This version also fixes an issue related to parameterized paths: if
the RHS has LATERAL references to the LHS, unique-ification becomes
meaningless because the RHS depends on the LHS, and such paths should
not be generated.
(The cc list is somehow lost; re-ccing.)
FWIW, I noticed that the row/cost estimates for the unique-ification
node on master can be very wrong. For example:
create table t(a int, b int);
insert into t select i%100, i from generate_series(1,10000)i;
vacuum analyze t;
set enable_hashagg to off;
explain (costs on)
select * from t t1, t t2 where (t1.a, t2.b) in
(select a, b from t t3 where t1.b is not null offset 0);
And look at the snippet from the plan:
(on master)
-> Unique (cost=934.39..1009.39 rows=10000 width=8)
-> Sort (cost=271.41..271.54 rows=50 width=8)
Sort Key: "ANY_subquery".a, "ANY_subquery".b
-> Subquery Scan on "ANY_subquery" (cost=0.00..270.00
rows=50 width=8)
The row estimate for the subpath is 50, but it increases to 10000
after unique-ification. How does that make sense?
This issue does not occur with this patch:
(on patched)
-> Unique (cost=271.41..271.79 rows=50 width=8)
-> Sort (cost=271.41..271.54 rows=50 width=8)
Sort Key: "ANY_subquery".a, "ANY_subquery".b
-> Subquery Scan on "ANY_subquery" (cost=0.00..270.00
rows=50 width=8)
Thanks
Richard
On Fri, Jul 4, 2025 at 10:41 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Jul 3, 2025 at 7:06 PM Richard Guo <guofenglinux@gmail.com> wrote:
This patch does not apply again, so here is a new rebase.
This version also fixes an issue related to parameterized paths: if
the RHS has LATERAL references to the LHS, unique-ification becomes
meaningless because the RHS depends on the LHS, and such paths should
not be generated.
(The cc list is somehow lost; re-ccing.)
The CI reports a test failure related to this patch, although I'm
quite confident it's unrelated to the changes introduced here.
The failure is: recovery/009_twophase time out (After 1000 seconds)
In any case, here's a freshly rebased version.
Hi Tom, I wonder if you've had a chance to look at this patch. It
would be great to have your input.
Thanks
Richard
Attachments:
v5-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchapplication/octet-stream; name=v5-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchDownload
From 4c0e3c42b9c3fe39f20e81a843f37eaaefd8f96e Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 21 May 2025 12:32:29 +0900
Subject: [PATCH v5] Pathify RHS unique-ification for semijoin planning
There are two implementation techniques for semijoins: one uses the
JOIN_SEMI jointype, where the executor emits at most one matching row
per left-hand side (LHS) row; the other unique-ifies the right-hand
side (RHS) and then performs a plain inner join.
The latter technique currently has some drawbacks related to the
unique-ification step.
* Only the cheapest-total path of the RHS is considered during
unique-ification. This may cause us to miss some optimization
opportunities; for example, a path with a better sort order might be
overlooked simply because it is not the cheapest in total cost. Such
a path could help avoid a sort at a higher level, potentially
resulting in a cheaper overall plan.
* We currently rely on heuristics to choose between hash-based and
sort-based unique-ification. A better approach would be to generate
paths for both methods and allow add_path() to decide which one is
preferable, consistent with how path selection is handled elsewhere in
the planner.
* In the sort-based implementation, we currently pay no attention to
the pathkeys of the input subpath or the resulting output. This can
result in redundant sort nodes being added to the final plan.
This patch improves semijoin planning by creating a new RelOptInfo for
the RHS rel to represent its unique-ified version. It then generates
multiple paths that represent elimination of distinct rows from the
RHS, considering both a hash-based implementation using the cheapest
total path of the original RHS rel, and sort-based implementations
that either exploit presorted input paths or explicitly sort the
cheapest total path. All resulting paths compete in add_path(), and
those deemed worthy of consideration are added to the new RelOptInfo.
Finally, the unique-ified rel is joined with the other side of the
semijoin using a plain inner join.
As a side effect, most of the code related to the JOIN_UNIQUE_OUTER
and JOIN_UNIQUE_INNER jointypes -- used to indicate that the LHS or
RHS path should be made unique -- has been removed. Besides, the
T_Unique path now has the same meaning for both semijoins and upper
DISTINCT clauses: it represents adjacent-duplicate removal on
presorted input. This patch unifies their handling by sharing the
same data structures and functions.
---
src/backend/optimizer/README | 3 +-
src/backend/optimizer/path/costsize.c | 6 +-
src/backend/optimizer/path/joinpath.c | 345 ++++--------
src/backend/optimizer/path/joinrels.c | 18 +-
src/backend/optimizer/plan/createplan.c | 292 +----------
src/backend/optimizer/plan/planner.c | 518 ++++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 30 +-
src/backend/optimizer/util/pathnode.c | 306 +----------
src/backend/optimizer/util/relnode.c | 13 +-
src/include/nodes/nodes.h | 4 +-
src/include/nodes/pathnodes.h | 66 ++-
src/include/optimizer/pathnode.h | 12 +-
src/include/optimizer/planner.h | 3 +
src/test/regress/expected/join.out | 15 +-
src/test/regress/expected/partition_join.out | 94 ++--
src/test/regress/expected/subselect.out | 233 ++++++++-
src/test/regress/sql/subselect.sql | 67 +++
src/tools/pgindent/typedefs.list | 2 -
18 files changed, 1058 insertions(+), 969 deletions(-)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..843368096fd 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -640,7 +640,6 @@ RelOptInfo - a relation or joined relations
GroupResultPath - childless Result plan node (used for degenerate grouping)
MaterialPath - a Material plan node
MemoizePath - a Memoize plan node for caching tuples from sub-paths
- UniquePath - remove duplicate rows (either by hashing or sorting)
GatherPath - collect the results of parallel workers
GatherMergePath - collect parallel results, preserving their common sort order
ProjectionPath - a Result plan node with child (used for projection)
@@ -648,7 +647,7 @@ RelOptInfo - a relation or joined relations
SortPath - a Sort plan node applied to some sub-path
IncrementalSortPath - an IncrementalSort plan node applied to some sub-path
GroupPath - a Group plan node applied to some sub-path
- UpperUniquePath - a Unique plan node applied to some sub-path
+ UniquePath - a Unique plan node applied to some sub-path
AggPath - an Agg plan node applied to some sub-path
GroupingSetsPath - an Agg plan node used to implement GROUPING SETS
MinMaxAggPath - a Result plan node with subplans performing MIN/MAX
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 1f04a2c182c..fb4b9310c78 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3963,7 +3963,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
* The whole issue is moot if we are working from a unique-ified outer
* input, or if we know we don't need to mark/restore at all.
*/
- if (IsA(outer_path, UniquePath) || path->skip_mark_restore)
+ if (IsA(outer_path, UniquePath) ||
+ IsA(outer_path, AggPath) ||
+ path->skip_mark_restore)
rescannedtuples = 0;
else
{
@@ -4358,7 +4360,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
* because we avoid contaminating the cache with a value that's wrong for
* non-unique-ified paths.
*/
- if (IsA(inner_path, UniquePath))
+ if (IsA(inner_path, UniquePath) || IsA(inner_path, AggPath))
{
innerbucketsize = 1.0 / virtualbuckets;
innermcvfreq = 0.0;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index ebedc5574ca..5a2b2bbefdb 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -112,12 +112,12 @@ static void generate_mergejoin_paths(PlannerInfo *root,
* "flipped around" if we are considering joining the rels in the opposite
* direction from what's indicated in sjinfo.
*
- * Also, this routine and others in this module accept the special JoinTypes
- * JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that we should
- * unique-ify the outer or inner relation and then apply a regular inner
- * join. These values are not allowed to propagate outside this module,
- * however. Path cost estimation code may need to recognize that it's
- * dealing with such a case --- the combination of nominal jointype INNER
+ * Also, this routine accepts the special JoinTypes JOIN_UNIQUE_OUTER and
+ * JOIN_UNIQUE_INNER to indicate that the outer or inner relation has been
+ * unique-ified and a regular inner join should then be applied. These values
+ * are not allowed to propagate outside this routine, however. Path cost
+ * estimation code, as well as match_unsorted_outer, may need to recognize that
+ * it's dealing with such a case --- the combination of nominal jointype INNER
* with sjinfo->jointype == JOIN_SEMI indicates that.
*/
void
@@ -129,6 +129,7 @@ add_paths_to_joinrel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
List *restrictlist)
{
+ JoinType save_jointype = jointype;
JoinPathExtraData extra;
bool mergejoin_allowed = true;
ListCell *lc;
@@ -165,10 +166,10 @@ add_paths_to_joinrel(PlannerInfo *root,
* reduce_unique_semijoins would've simplified it), so there's no point in
* calling innerrel_is_unique. However, if the LHS covers all of the
* semijoin's min_lefthand, then it's appropriate to set inner_unique
- * because the path produced by create_unique_path will be unique relative
- * to the LHS. (If we have an LHS that's only part of the min_lefthand,
- * that is *not* true.) For JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid
- * letting that value escape this module.
+ * because the unique relation produced by create_unique_paths will be
+ * unique relative to the LHS. (If we have an LHS that's only part of the
+ * min_lefthand, that is *not* true.) For JOIN_UNIQUE_OUTER, pass
+ * JOIN_INNER to avoid letting that value escape this module.
*/
switch (jointype)
{
@@ -199,6 +200,13 @@ add_paths_to_joinrel(PlannerInfo *root,
break;
}
+ /*
+ * If the outer or inner relation has been unique-ified, handle as a plain
+ * inner join.
+ */
+ if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
+ jointype = JOIN_INNER;
+
/*
* Find potential mergejoin clauses. We can skip this if we are not
* interested in doing a mergejoin. However, mergejoin may be our only
@@ -329,7 +337,7 @@ add_paths_to_joinrel(PlannerInfo *root,
joinrel->fdwroutine->GetForeignJoinPaths)
joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
/*
* 6. Finally, give extensions a chance to manipulate the path list. They
@@ -339,7 +347,7 @@ add_paths_to_joinrel(PlannerInfo *root,
*/
if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
}
/*
@@ -1364,7 +1372,6 @@ sort_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *outer_path;
Path *inner_path;
Path *cheapest_partial_outer = NULL;
@@ -1402,38 +1409,16 @@ sort_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(inner_path, outerrel))
return;
- /*
- * If unique-ification is requested, do it and then handle as a plain
- * inner join.
- */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- outer_path = (Path *) create_unique_path(root, outerrel,
- outer_path, extra->sjinfo);
- Assert(outer_path);
- jointype = JOIN_INNER;
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- inner_path = (Path *) create_unique_path(root, innerrel,
- inner_path, extra->sjinfo);
- Assert(inner_path);
- jointype = JOIN_INNER;
- }
-
/*
* If the joinrel is parallel-safe, we may be able to consider a partial
- * merge join. However, we can't handle JOIN_UNIQUE_OUTER, because the
- * outer path will be partial, and therefore we won't be able to properly
- * guarantee uniqueness. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT
- * and JOIN_RIGHT_ANTI, because they can produce false null extended rows.
+ * merge join. However, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * JOIN_RIGHT_ANTI, because they can produce false null extended rows.
* Also, the resulting path must not be parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -1441,7 +1426,7 @@ sort_inner_and_outer(PlannerInfo *root,
if (inner_path->parallel_safe)
cheapest_safe_inner = inner_path;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
@@ -1580,13 +1565,9 @@ generate_mergejoin_paths(PlannerInfo *root,
List *trialsortkeys;
Path *cheapest_startup_inner;
Path *cheapest_total_inner;
- JoinType save_jointype = jointype;
int num_sortkeys;
int sortkeycnt;
- if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/* Look for useful mergeclauses (if any) */
mergeclauses =
find_mergeclauses_for_outer_pathkeys(root,
@@ -1636,10 +1617,6 @@ generate_mergejoin_paths(PlannerInfo *root,
extra,
is_partial);
- /* Can't do anything else if inner path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
/*
* Look for presorted inner paths that satisfy the innersortkey list ---
* or any truncation thereof, if we are allowed to build a mergejoin using
@@ -1819,7 +1796,6 @@ match_unsorted_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool nestjoinOK;
bool useallclauses;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
@@ -1855,12 +1831,6 @@ match_unsorted_outer(PlannerInfo *root,
nestjoinOK = false;
useallclauses = true;
break;
- case JOIN_UNIQUE_OUTER:
- case JOIN_UNIQUE_INNER:
- jointype = JOIN_INNER;
- nestjoinOK = true;
- useallclauses = false;
- break;
default:
elog(ERROR, "unrecognized join type: %d",
(int) jointype);
@@ -1873,24 +1843,27 @@ match_unsorted_outer(PlannerInfo *root,
* If inner_cheapest_total is parameterized by the outer rel, ignore it;
* we will consider it below as a member of cheapest_parameterized_paths,
* but the other possibilities considered in this routine aren't usable.
+ *
+ * Furthermore, if the inner side is a unique-ified relation, we cannot
+ * generate any valid paths here, because the inner rel's dependency on
+ * the outer rel makes unique-ification meaningless.
*/
if (PATH_PARAM_BY_REL(inner_cheapest_total, outerrel))
+ {
inner_cheapest_total = NULL;
- /*
- * If we need to unique-ify the inner path, we will consider only the
- * cheapest-total inner.
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /* No way to do this with an inner path parameterized by outer rel */
- if (inner_cheapest_total == NULL)
+ /*
+ * When the nominal jointype is JOIN_INNER, sjinfo->jointype is
+ * JOIN_SEMI, and the inner rel is exactly the RHS of the semijoin, it
+ * indicates that the inner side is a unique-ified relation.
+ */
+ if (jointype == JOIN_INNER &&
+ extra->sjinfo->jointype == JOIN_SEMI &&
+ bms_equal(extra->sjinfo->syn_righthand, innerrel->relids))
return;
- inner_cheapest_total = (Path *)
- create_unique_path(root, innerrel, inner_cheapest_total, extra->sjinfo);
- Assert(inner_cheapest_total);
}
- else if (nestjoinOK)
+
+ if (nestjoinOK)
{
/*
* Consider materializing the cheapest inner path, unless
@@ -1914,20 +1887,6 @@ match_unsorted_outer(PlannerInfo *root,
if (PATH_PARAM_BY_REL(outerpath, innerrel))
continue;
- /*
- * If we need to unique-ify the outer path, it's pointless to consider
- * any but the cheapest outer. (XXX we don't consider parameterized
- * outers, nor inners, for unique-ified cases. Should we?)
- */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- {
- if (outerpath != outerrel->cheapest_total_path)
- continue;
- outerpath = (Path *) create_unique_path(root, outerrel,
- outerpath, extra->sjinfo);
- Assert(outerpath);
- }
-
/*
* The result will have this sort order (even if it is implemented as
* a nestloop, and even if some of the mergeclauses are implemented by
@@ -1936,21 +1895,7 @@ match_unsorted_outer(PlannerInfo *root,
merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
outerpath->pathkeys);
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /*
- * Consider nestloop join, but only with the unique-ified cheapest
- * inner path
- */
- try_nestloop_path(root,
- joinrel,
- outerpath,
- inner_cheapest_total,
- merge_pathkeys,
- jointype,
- extra);
- }
- else if (nestjoinOK)
+ if (nestjoinOK)
{
/*
* Consider nestloop joins using this outer path and various
@@ -2001,17 +1946,13 @@ match_unsorted_outer(PlannerInfo *root,
extra);
}
- /* Can't do anything else if outer path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- continue;
-
/* Can't do anything else if inner rel is parameterized by outer */
if (inner_cheapest_total == NULL)
continue;
/* Generate merge join paths */
generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
- save_jointype, extra, useallclauses,
+ jointype, extra, useallclauses,
inner_cheapest_total, merge_pathkeys,
false);
}
@@ -2019,41 +1960,35 @@ match_unsorted_outer(PlannerInfo *root,
/*
* Consider partial nestloop and mergejoin plan if outerrel has any
* partial path and the joinrel is parallel-safe. However, we can't
- * handle JOIN_UNIQUE_OUTER, because the outer path will be partial, and
- * therefore we won't be able to properly guarantee uniqueness. Nor can
- * we handle joins needing lateral rels, since partial paths must not be
- * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * handle joins needing lateral rels, since partial paths must not be
+ * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
* JOIN_RIGHT_ANTI, because they can produce false null extended rows.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
if (nestjoinOK)
consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
- save_jointype, extra);
+ jointype, extra);
/*
* If inner_cheapest_total is NULL or non parallel-safe then find the
- * cheapest total parallel safe path. If doing JOIN_UNIQUE_INNER, we
- * can't use any alternative inner path.
+ * cheapest total parallel safe path.
*/
if (inner_cheapest_total == NULL ||
!inner_cheapest_total->parallel_safe)
{
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
- inner_cheapest_total = get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
+ inner_cheapest_total =
+ get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
if (inner_cheapest_total)
consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
- save_jointype, extra,
+ jointype, extra,
inner_cheapest_total);
}
}
@@ -2118,24 +2053,17 @@ consider_parallel_nestloop(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
Path *matpath = NULL;
ListCell *lc1;
- if (jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/*
- * Consider materializing the cheapest inner path, unless: 1) we're doing
- * JOIN_UNIQUE_INNER, because in this case we have to unique-ify the
- * cheapest inner path, 2) enable_material is off, 3) the cheapest inner
- * path is not parallel-safe, 4) the cheapest inner path is parameterized
- * by the outer rel, or 5) the cheapest inner path materializes its output
- * anyway.
+ * Consider materializing the cheapest inner path, unless: 1)
+ * enable_material is off, 2) the cheapest inner path is not
+ * parallel-safe, 3) the cheapest inner path is parameterized by the outer
+ * rel, or 4) the cheapest inner path materializes its output anyway.
*/
- if (save_jointype != JOIN_UNIQUE_INNER &&
- enable_material && inner_cheapest_total->parallel_safe &&
+ if (enable_material && inner_cheapest_total->parallel_safe &&
!PATH_PARAM_BY_REL(inner_cheapest_total, outerrel) &&
!ExecMaterializesOutput(inner_cheapest_total->pathtype))
{
@@ -2169,23 +2097,6 @@ consider_parallel_nestloop(PlannerInfo *root,
if (!innerpath->parallel_safe)
continue;
- /*
- * If we're doing JOIN_UNIQUE_INNER, we can only use the inner's
- * cheapest_total_path, and we have to unique-ify it. (We might
- * be able to relax this to allow other safe, unparameterized
- * inner paths, but right now create_unique_path is not on board
- * with that.)
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- if (innerpath != innerrel->cheapest_total_path)
- continue;
- innerpath = (Path *) create_unique_path(root, innerrel,
- innerpath,
- extra->sjinfo);
- Assert(innerpath);
- }
-
try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
pathkeys, jointype, extra);
@@ -2227,7 +2138,6 @@ hash_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool isouterjoin = IS_OUTER_JOIN(jointype);
List *hashclauses;
ListCell *l;
@@ -2290,6 +2200,8 @@ hash_inner_and_outer(PlannerInfo *root,
Path *cheapest_startup_outer = outerrel->cheapest_startup_path;
Path *cheapest_total_outer = outerrel->cheapest_total_path;
Path *cheapest_total_inner = innerrel->cheapest_total_path;
+ ListCell *lc1;
+ ListCell *lc2;
/*
* If either cheapest-total path is parameterized by the other rel, we
@@ -2301,114 +2213,64 @@ hash_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
return;
- /* Unique-ify if need be; we ignore parameterized possibilities */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- cheapest_total_outer = (Path *)
- create_unique_path(root, outerrel,
- cheapest_total_outer, extra->sjinfo);
- Assert(cheapest_total_outer);
- jointype = JOIN_INNER;
- try_hashjoin_path(root,
- joinrel,
- cheapest_total_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- /* no possibility of cheap startup here */
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- cheapest_total_inner = (Path *)
- create_unique_path(root, innerrel,
- cheapest_total_inner, extra->sjinfo);
- Assert(cheapest_total_inner);
- jointype = JOIN_INNER;
+ /*
+ * Consider the cheapest startup outer together with the cheapest
+ * total inner, and then consider pairings of cheapest-total paths
+ * including parameterized ones. There is no use in generating
+ * parameterized paths on the basis of possibly cheap startup cost, so
+ * this is sufficient.
+ */
+ if (cheapest_startup_outer != NULL)
try_hashjoin_path(root,
joinrel,
- cheapest_total_outer,
+ cheapest_startup_outer,
cheapest_total_inner,
hashclauses,
jointype,
extra);
- if (cheapest_startup_outer != NULL &&
- cheapest_startup_outer != cheapest_total_outer)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- }
- else
+
+ foreach(lc1, outerrel->cheapest_parameterized_paths)
{
+ Path *outerpath = (Path *) lfirst(lc1);
+
/*
- * For other jointypes, we consider the cheapest startup outer
- * together with the cheapest total inner, and then consider
- * pairings of cheapest-total paths including parameterized ones.
- * There is no use in generating parameterized paths on the basis
- * of possibly cheap startup cost, so this is sufficient.
+ * We cannot use an outer path that is parameterized by the inner
+ * rel.
*/
- ListCell *lc1;
- ListCell *lc2;
-
- if (cheapest_startup_outer != NULL)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
+ if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ continue;
- foreach(lc1, outerrel->cheapest_parameterized_paths)
+ foreach(lc2, innerrel->cheapest_parameterized_paths)
{
- Path *outerpath = (Path *) lfirst(lc1);
+ Path *innerpath = (Path *) lfirst(lc2);
/*
- * We cannot use an outer path that is parameterized by the
- * inner rel.
+ * We cannot use an inner path that is parameterized by the
+ * outer rel, either.
*/
- if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ if (PATH_PARAM_BY_REL(innerpath, outerrel))
continue;
- foreach(lc2, innerrel->cheapest_parameterized_paths)
- {
- Path *innerpath = (Path *) lfirst(lc2);
-
- /*
- * We cannot use an inner path that is parameterized by
- * the outer rel, either.
- */
- if (PATH_PARAM_BY_REL(innerpath, outerrel))
- continue;
+ if (outerpath == cheapest_startup_outer &&
+ innerpath == cheapest_total_inner)
+ continue; /* already tried it */
- if (outerpath == cheapest_startup_outer &&
- innerpath == cheapest_total_inner)
- continue; /* already tried it */
-
- try_hashjoin_path(root,
- joinrel,
- outerpath,
- innerpath,
- hashclauses,
- jointype,
- extra);
- }
+ try_hashjoin_path(root,
+ joinrel,
+ outerpath,
+ innerpath,
+ hashclauses,
+ jointype,
+ extra);
}
}
/*
* If the joinrel is parallel-safe, we may be able to consider a
- * partial hash join. However, we can't handle JOIN_UNIQUE_OUTER,
- * because the outer path will be partial, and therefore we won't be
- * able to properly guarantee uniqueness. Also, the resulting path
- * must not be parameterized.
+ * partial hash join. However, the resulting path must not be
+ * parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -2421,11 +2283,9 @@ hash_inner_and_outer(PlannerInfo *root,
/*
* Can we use a partial inner plan too, so that we can build a
- * shared hash table in parallel? We can't handle
- * JOIN_UNIQUE_INNER because we can't guarantee uniqueness.
+ * shared hash table in parallel?
*/
if (innerrel->partial_pathlist != NIL &&
- save_jointype != JOIN_UNIQUE_INNER &&
enable_parallel_hash)
{
cheapest_partial_inner =
@@ -2441,19 +2301,18 @@ hash_inner_and_outer(PlannerInfo *root,
* Normally, given that the joinrel is parallel-safe, the cheapest
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
- * path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, right-semi or right-anti join, we
- * can't use parallelism (building the hash table in each backend)
+ * path. If full, right, right-semi or right-anti join, we can't
+ * use parallelism (building the hash table in each backend)
* because no one process has all the match bits.
*/
- if (save_jointype == JOIN_FULL ||
- save_jointype == JOIN_RIGHT ||
- save_jointype == JOIN_RIGHT_SEMI ||
- save_jointype == JOIN_RIGHT_ANTI)
+ if (jointype == JOIN_FULL ||
+ jointype == JOIN_RIGHT ||
+ jointype == JOIN_RIGHT_SEMI ||
+ jointype == JOIN_RIGHT_ANTI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index aad41b94009..535248aa525 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -19,6 +19,7 @@
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
+#include "optimizer/planner.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
@@ -444,8 +445,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel2, sjinfo) != NULL)
{
/*----------
* For a semijoin, we can join the RHS to anything else by
@@ -477,8 +477,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel1->relids) &&
- create_unique_path(root, rel1, rel1->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel1, sjinfo) != NULL)
{
/* Reversed semijoin case */
if (match_sjinfo)
@@ -886,6 +885,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist)
{
+ RelOptInfo *unique_rel2;
+
/*
* Consider paths using each rel as both outer and inner. Depending on
* the join type, a provably empty outer or inner rel might mean the join
@@ -991,14 +992,13 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
/*
* If we know how to unique-ify the RHS and one input rel is
* exactly the RHS (not a superset) we can consider unique-ifying
- * it and then doing a regular join. (The create_unique_path
+ * it and then doing a regular join. (The create_unique_paths
* check here is probably redundant with what join_is_legal did,
* but if so the check is cheap because it's cached. So test
* anyway to be sure.)
*/
if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ (unique_rel2 = create_unique_paths(root, rel2, sjinfo)) != NULL)
{
if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
restriction_is_constant_false(restrictlist, joinrel, false))
@@ -1006,10 +1006,10 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
mark_dummy_rel(joinrel);
break;
}
- add_paths_to_joinrel(root, joinrel, rel1, rel2,
+ add_paths_to_joinrel(root, joinrel, rel1, unique_rel2,
JOIN_UNIQUE_INNER, sjinfo,
restrictlist);
- add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ add_paths_to_joinrel(root, joinrel, unique_rel2, rel1,
JOIN_UNIQUE_OUTER, sjinfo,
restrictlist);
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8a9f1d7a943..aaa77ecbcf9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -95,8 +95,6 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
int flags);
static Memoize *create_memoize_plan(PlannerInfo *root, MemoizePath *best_path,
int flags);
-static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
- int flags);
static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
static Plan *create_projection_plan(PlannerInfo *root,
ProjectionPath *best_path,
@@ -106,8 +104,7 @@ static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
static IncrementalSort *create_incrementalsort_plan(PlannerInfo *root,
IncrementalSortPath *best_path, int flags);
static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
-static Unique *create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path,
- int flags);
+static Unique *create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags);
static Agg *create_agg_plan(PlannerInfo *root, AggPath *best_path);
static Plan *create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path);
static Result *create_minmaxagg_plan(PlannerInfo *root, MinMaxAggPath *best_path);
@@ -293,9 +290,9 @@ static WindowAgg *make_windowagg(List *tlist, WindowClause *wc,
static Group *make_group(List *tlist, List *qual, int numGroupCols,
AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations,
Plan *lefttree);
-static Unique *make_unique_from_sortclauses(Plan *lefttree, List *distinctList);
static Unique *make_unique_from_pathkeys(Plan *lefttree,
- List *pathkeys, int numCols);
+ List *pathkeys, int numCols,
+ Relids relids);
static Gather *make_gather(List *qptlist, List *qpqual,
int nworkers, int rescan_param, bool single_copy, Plan *subplan);
static SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy,
@@ -467,19 +464,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
flags);
break;
case T_Unique:
- if (IsA(best_path, UpperUniquePath))
- {
- plan = (Plan *) create_upper_unique_plan(root,
- (UpperUniquePath *) best_path,
- flags);
- }
- else
- {
- Assert(IsA(best_path, UniquePath));
- plan = create_unique_plan(root,
- (UniquePath *) best_path,
- flags);
- }
+ plan = (Plan *) create_unique_plan(root,
+ (UniquePath *) best_path,
+ flags);
break;
case T_Gather:
plan = (Plan *) create_gather_plan(root,
@@ -1760,207 +1747,6 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags)
return plan;
}
-/*
- * create_unique_plan
- * Create a Unique plan for 'best_path' and (recursively) plans
- * for its subpaths.
- *
- * Returns a Plan node.
- */
-static Plan *
-create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
-{
- Plan *plan;
- Plan *subplan;
- List *in_operators;
- List *uniq_exprs;
- List *newtlist;
- int nextresno;
- bool newitems;
- int numGroupCols;
- AttrNumber *groupColIdx;
- Oid *groupCollations;
- int groupColPos;
- ListCell *l;
-
- /* Unique doesn't project, so tlist requirements pass through */
- subplan = create_plan_recurse(root, best_path->subpath, flags);
-
- /* Done if we don't need to do any actual unique-ifying */
- if (best_path->umethod == UNIQUE_PATH_NOOP)
- return subplan;
-
- /*
- * As constructed, the subplan has a "flat" tlist containing just the Vars
- * needed here and at upper levels. The values we are supposed to
- * unique-ify may be expressions in these variables. We have to add any
- * such expressions to the subplan's tlist.
- *
- * The subplan may have a "physical" tlist if it is a simple scan plan. If
- * we're going to sort, this should be reduced to the regular tlist, so
- * that we don't sort more data than we need to. For hashing, the tlist
- * should be left as-is if we don't need to add any expressions; but if we
- * do have to add expressions, then a projection step will be needed at
- * runtime anyway, so we may as well remove unneeded items. Therefore
- * newtlist starts from build_path_tlist() not just a copy of the
- * subplan's tlist; and we don't install it into the subplan unless we are
- * sorting or stuff has to be added.
- */
- in_operators = best_path->in_operators;
- uniq_exprs = best_path->uniq_exprs;
-
- /* initialize modified subplan tlist as just the "required" vars */
- newtlist = build_path_tlist(root, &best_path->path);
- nextresno = list_length(newtlist) + 1;
- newitems = false;
-
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle)
- {
- tle = makeTargetEntry((Expr *) uniqexpr,
- nextresno,
- NULL,
- false);
- newtlist = lappend(newtlist, tle);
- nextresno++;
- newitems = true;
- }
- }
-
- /* Use change_plan_targetlist in case we need to insert a Result node */
- if (newitems || best_path->umethod == UNIQUE_PATH_SORT)
- subplan = change_plan_targetlist(subplan, newtlist,
- best_path->path.parallel_safe);
-
- /*
- * Build control information showing which subplan output columns are to
- * be examined by the grouping step. Unfortunately we can't merge this
- * with the previous loop, since we didn't then know which version of the
- * subplan tlist we'd end up using.
- */
- newtlist = subplan->targetlist;
- numGroupCols = list_length(uniq_exprs);
- groupColIdx = (AttrNumber *) palloc(numGroupCols * sizeof(AttrNumber));
- groupCollations = (Oid *) palloc(numGroupCols * sizeof(Oid));
-
- groupColPos = 0;
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle) /* shouldn't happen */
- elog(ERROR, "failed to find unique expression in subplan tlist");
- groupColIdx[groupColPos] = tle->resno;
- groupCollations[groupColPos] = exprCollation((Node *) tle->expr);
- groupColPos++;
- }
-
- if (best_path->umethod == UNIQUE_PATH_HASH)
- {
- Oid *groupOperators;
-
- /*
- * Get the hashable equality operators for the Agg node to use.
- * Normally these are the same as the IN clause operators, but if
- * those are cross-type operators then the equality operators are the
- * ones for the IN clause operators' RHS datatype.
- */
- groupOperators = (Oid *) palloc(numGroupCols * sizeof(Oid));
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid eq_oper;
-
- if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
- elog(ERROR, "could not find compatible hash operator for operator %u",
- in_oper);
- groupOperators[groupColPos++] = eq_oper;
- }
-
- /*
- * Since the Agg node is going to project anyway, we can give it the
- * minimum output tlist, without any stuff we might have added to the
- * subplan tlist.
- */
- plan = (Plan *) make_agg(build_path_tlist(root, &best_path->path),
- NIL,
- AGG_HASHED,
- AGGSPLIT_SIMPLE,
- numGroupCols,
- groupColIdx,
- groupOperators,
- groupCollations,
- NIL,
- NIL,
- best_path->path.rows,
- 0,
- subplan);
- }
- else
- {
- List *sortList = NIL;
- Sort *sort;
-
- /* Create an ORDER BY list to sort the input compatibly */
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid sortop;
- Oid eqop;
- TargetEntry *tle;
- SortGroupClause *sortcl;
-
- sortop = get_ordering_op_for_equality_op(in_oper, false);
- if (!OidIsValid(sortop)) /* shouldn't happen */
- elog(ERROR, "could not find ordering operator for equality operator %u",
- in_oper);
-
- /*
- * The Unique node will need equality operators. Normally these
- * are the same as the IN clause operators, but if those are
- * cross-type operators then the equality operators are the ones
- * for the IN clause operators' RHS datatype.
- */
- eqop = get_equality_op_for_ordering_op(sortop, NULL);
- if (!OidIsValid(eqop)) /* shouldn't happen */
- elog(ERROR, "could not find equality operator for ordering operator %u",
- sortop);
-
- tle = get_tle_by_resno(subplan->targetlist,
- groupColIdx[groupColPos]);
- Assert(tle != NULL);
-
- sortcl = makeNode(SortGroupClause);
- sortcl->tleSortGroupRef = assignSortGroupRef(tle,
- subplan->targetlist);
- sortcl->eqop = eqop;
- sortcl->sortop = sortop;
- sortcl->reverse_sort = false;
- sortcl->nulls_first = false;
- sortcl->hashable = false; /* no need to make this accurate */
- sortList = lappend(sortList, sortcl);
- groupColPos++;
- }
- sort = make_sort_from_sortclauses(sortList, subplan);
- label_sort_with_costsize(root, sort, -1.0);
- plan = (Plan *) make_unique_from_sortclauses((Plan *) sort, sortList);
- }
-
- /* Copy cost data from Path to Plan */
- copy_generic_path_info(plan, &best_path->path);
-
- return plan;
-}
-
/*
* create_gather_plan
*
@@ -2318,13 +2104,13 @@ create_group_plan(PlannerInfo *root, GroupPath *best_path)
}
/*
- * create_upper_unique_plan
+ * create_unique_plan
*
* Create a Unique plan for 'best_path' and (recursively) plans
* for its subpaths.
*/
static Unique *
-create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flags)
+create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
{
Unique *plan;
Plan *subplan;
@@ -2338,7 +2124,8 @@ create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flag
plan = make_unique_from_pathkeys(subplan,
best_path->path.pathkeys,
- best_path->numkeys);
+ best_path->numkeys,
+ best_path->path.parent->relids);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -6871,61 +6658,12 @@ make_group(List *tlist,
}
/*
- * distinctList is a list of SortGroupClauses, identifying the targetlist items
- * that should be considered by the Unique filter. The input path must
- * already be sorted accordingly.
- */
-static Unique *
-make_unique_from_sortclauses(Plan *lefttree, List *distinctList)
-{
- Unique *node = makeNode(Unique);
- Plan *plan = &node->plan;
- int numCols = list_length(distinctList);
- int keyno = 0;
- AttrNumber *uniqColIdx;
- Oid *uniqOperators;
- Oid *uniqCollations;
- ListCell *slitem;
-
- plan->targetlist = lefttree->targetlist;
- plan->qual = NIL;
- plan->lefttree = lefttree;
- plan->righttree = NULL;
-
- /*
- * convert SortGroupClause list into arrays of attr indexes and equality
- * operators, as wanted by executor
- */
- Assert(numCols > 0);
- uniqColIdx = (AttrNumber *) palloc(sizeof(AttrNumber) * numCols);
- uniqOperators = (Oid *) palloc(sizeof(Oid) * numCols);
- uniqCollations = (Oid *) palloc(sizeof(Oid) * numCols);
-
- foreach(slitem, distinctList)
- {
- SortGroupClause *sortcl = (SortGroupClause *) lfirst(slitem);
- TargetEntry *tle = get_sortgroupclause_tle(sortcl, plan->targetlist);
-
- uniqColIdx[keyno] = tle->resno;
- uniqOperators[keyno] = sortcl->eqop;
- uniqCollations[keyno] = exprCollation((Node *) tle->expr);
- Assert(OidIsValid(uniqOperators[keyno]));
- keyno++;
- }
-
- node->numCols = numCols;
- node->uniqColIdx = uniqColIdx;
- node->uniqOperators = uniqOperators;
- node->uniqCollations = uniqCollations;
-
- return node;
-}
-
-/*
- * as above, but use pathkeys to identify the sort columns and semantics
+ * pathkeys is a list of PathKeys, identifying the sort columns and semantics.
+ * The input path must already be sorted accordingly.
*/
static Unique *
-make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
+make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols,
+ Relids relids)
{
Unique *node = makeNode(Unique);
Plan *plan = &node->plan;
@@ -6988,7 +6726,7 @@ make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
foreach(j, plan->targetlist)
{
tle = (TargetEntry *) lfirst(j);
- em = find_ec_member_matching_expr(ec, tle->expr, NULL);
+ em = find_ec_member_matching_expr(ec, tle->expr, relids);
if (em)
{
/* found expr already in tlist */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 549aedcfa99..8efd3f874f9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -267,6 +267,12 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
+static void create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
+static void create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
/*****************************************************************************
@@ -4917,10 +4923,10 @@ create_partial_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_partial_path(partial_distinct_rel, (Path *)
- create_upper_unique_path(root, partial_distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, partial_distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -5111,10 +5117,10 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_path(distinct_rel, (Path *)
- create_upper_unique_path(root, distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -8248,3 +8254,499 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
return grouplist;
}
+
+/*
+ * create_unique_paths
+ * Build a new RelOptInfo containing Paths that represent elimination of
+ * distinct rows from the input data. Distinct-ness is defined according to
+ * the needs of the semijoin represented by sjinfo. If it is not possible
+ * to identify how to make the data unique, NULL is returned.
+ *
+ * If used at all, this is likely to be called repeatedly on the same rel;
+ * So we cache the result.
+ */
+RelOptInfo *
+create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *unique_rel;
+ List *sortPathkeys = NIL;
+ List *groupClause = NIL;
+ MemoryContext oldcontext;
+
+ /* Caller made a mistake if SpecialJoinInfo is the wrong one */
+ Assert(sjinfo->jointype == JOIN_SEMI);
+ Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
+
+ /* If result already cached, return it */
+ if (rel->unique_rel)
+ return rel->unique_rel;
+
+ /* If it's not possible to unique-ify, return NULL */
+ if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
+ return NULL;
+
+ /*
+ * When called during GEQO join planning, we are in a short-lived memory
+ * context. We must make sure that the unique rel and any subsidiary data
+ * structures created for a baserel survive the GEQO cycle, else the
+ * baserel is trashed for future GEQO cycles. On the other hand, when we
+ * are creating those for a joinrel during GEQO, we don't want them to
+ * clutter the main planning context. Upshot is that the best solution is
+ * to explicitly allocate memory in the same context the given RelOptInfo
+ * is in.
+ */
+ oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
+
+ unique_rel = makeNode(RelOptInfo);
+ memcpy(unique_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ unique_rel->pathlist = NIL;
+ unique_rel->ppilist = NIL;
+ unique_rel->partial_pathlist = NIL;
+ unique_rel->cheapest_startup_path = NULL;
+ unique_rel->cheapest_total_path = NULL;
+ unique_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * Build the target list for the unique rel. We also build the pathkeys
+ * that represent the ordering requirements for the sort-based
+ * implementation, and the list of SortGroupClause nodes that represent
+ * the columns to be grouped on for the hash-based implementation.
+ *
+ * For a child rel, we can construct these fields from those of its
+ * parent.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ PathTarget *child_unique_target;
+ PathTarget *parent_unique_target;
+
+ parent_unique_target = rel->top_parent->unique_rel->reltarget;
+
+ child_unique_target = copy_pathtarget(parent_unique_target);
+
+ /* Translate the target expressions */
+ child_unique_target->exprs = (List *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) parent_unique_target->exprs,
+ rel,
+ rel->top_parent);
+
+ unique_rel->reltarget = child_unique_target;
+
+ sortPathkeys = rel->top_parent->unique_pathkeys;
+ groupClause = rel->top_parent->unique_groupclause;
+ }
+ else
+ {
+ List *newtlist;
+ int nextresno;
+ List *sortList = NIL;
+ ListCell *lc1;
+ ListCell *lc2;
+
+ /*
+ * The values we are supposed to unique-ify may be expressions in the
+ * variables of the input rel's targetlist. We have to add any such
+ * expressions to the unique rel's targetlist.
+ *
+ * While in the loop, build the lists of SortGroupClause's that
+ * represent the ordering for the sort-based implementation and the
+ * grouping for the hash-based implementation.
+ */
+ newtlist = make_tlist_from_pathtarget(rel->reltarget);
+ nextresno = list_length(newtlist) + 1;
+
+ forboth(lc1, sjinfo->semi_rhs_exprs, lc2, sjinfo->semi_operators)
+ {
+ Expr *uniqexpr = lfirst(lc1);
+ Oid in_oper = lfirst_oid(lc2);
+ Oid sortop = InvalidOid;
+ TargetEntry *tle;
+
+ tle = tlist_member(uniqexpr, newtlist);
+ if (!tle)
+ {
+ tle = makeTargetEntry((Expr *) uniqexpr,
+ nextresno,
+ NULL,
+ false);
+ newtlist = lappend(newtlist, tle);
+ nextresno++;
+ }
+
+ if (sjinfo->semi_can_btree)
+ {
+ /* Create an ORDER BY list to sort the input compatibly */
+ Oid eqop;
+ SortGroupClause *sortcl;
+
+ sortop = get_ordering_op_for_equality_op(in_oper, false);
+ if (!OidIsValid(sortop)) /* shouldn't happen */
+ elog(ERROR, "could not find ordering operator for equality operator %u",
+ in_oper);
+
+ /*
+ * The Unique node will need equality operators. Normally
+ * these are the same as the IN clause operators, but if those
+ * are cross-type operators then the equality operators are
+ * the ones for the IN clause operators' RHS datatype.
+ */
+ eqop = get_equality_op_for_ordering_op(sortop, NULL);
+ if (!OidIsValid(eqop)) /* shouldn't happen */
+ elog(ERROR, "could not find equality operator for ordering operator %u",
+ sortop);
+
+ sortcl = makeNode(SortGroupClause);
+ sortcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ sortcl->eqop = eqop;
+ sortcl->sortop = sortop;
+ sortcl->reverse_sort = false;
+ sortcl->nulls_first = false;
+ sortcl->hashable = false; /* no need to make this accurate */
+ sortList = lappend(sortList, sortcl);
+ }
+ if (sjinfo->semi_can_hash)
+ {
+ /* Create a GROUP BY list for the Agg node to use */
+ Oid eq_oper;
+ SortGroupClause *groupcl;
+
+ /*
+ * Get the hashable equality operators for the Agg node to
+ * use. Normally these are the same as the IN clause
+ * operators, but if those are cross-type operators then the
+ * equality operators are the ones for the IN clause
+ * operators' RHS datatype.
+ */
+ if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
+ elog(ERROR, "could not find compatible hash operator for operator %u",
+ in_oper);
+
+ groupcl = makeNode(SortGroupClause);
+ groupcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ groupcl->eqop = eq_oper;
+ groupcl->sortop = sortop;
+ groupcl->reverse_sort = false;
+ groupcl->nulls_first = false;
+ groupcl->hashable = true;
+ groupClause = lappend(groupClause, groupcl);
+ }
+ }
+
+ unique_rel->reltarget = create_pathtarget(root, newtlist);
+ sortPathkeys = make_pathkeys_for_sortclauses(root, sortList, newtlist);
+ }
+
+ /* build unique paths based on input rel's pathlist */
+ create_final_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* build unique paths based on input rel's partial_pathlist */
+ create_partial_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* Now choose the best path(s) */
+ set_cheapest(unique_rel);
+
+ /*
+ * There shouldn't be any partial paths for the unique relation;
+ * otherwise, we won't be able to properly guarantee uniqueness.
+ */
+ Assert(unique_rel->partial_pathlist == NIL);
+
+ /* Cache the result */
+ rel->unique_rel = unique_rel;
+ rel->unique_pathkeys = sortPathkeys;
+ rel->unique_groupclause = groupClause;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return unique_rel;
+}
+
+/*
+ * create_final_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' pathlist
+ */
+static void
+create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ Path *cheapest_input_path = input_rel->cheapest_total_path;
+
+ /* Estimate number of output rows */
+ unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ cheapest_input_path->rows,
+ NULL,
+ NULL);
+
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, input_rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_input_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_input_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ input_path,
+ unique_rel->reltarget);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, unique_rel, path,
+ list_length(sortPathkeys),
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ cheapest_input_path,
+ unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ unique_rel,
+ path,
+ cheapest_input_path->pathtarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+
+ }
+}
+
+/*
+ * create_partial_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' partial_pathlist
+ */
+static void
+create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ RelOptInfo *partial_unique_rel;
+ Path *cheapest_partial_path;
+
+ /* nothing to do when there are no partial paths in the input rel */
+ if (!input_rel->consider_parallel || input_rel->partial_pathlist == NIL)
+ return;
+
+ /*
+ * nothing to do if there's anything in the targetlist that's
+ * parallel-restricted.
+ */
+ if (!is_parallel_safe(root, (Node *) unique_rel->reltarget->exprs))
+ return;
+
+ cheapest_partial_path = linitial(input_rel->partial_pathlist);
+
+ partial_unique_rel = makeNode(RelOptInfo);
+ memcpy(partial_unique_rel, input_rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ partial_unique_rel->pathlist = NIL;
+ partial_unique_rel->ppilist = NIL;
+ partial_unique_rel->partial_pathlist = NIL;
+ partial_unique_rel->cheapest_startup_path = NULL;
+ partial_unique_rel->cheapest_total_path = NULL;
+ partial_unique_rel->cheapest_parameterized_paths = NIL;
+
+ /* Estimate number of output rows */
+ partial_unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ cheapest_partial_path->rows,
+ NULL,
+ NULL);
+ partial_unique_rel->reltarget = unique_rel->reltarget;
+
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest partial path and incremental sort on any paths
+ * with presorted keys.
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ input_path,
+ partial_unique_rel->reltarget);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, partial_unique_rel, path,
+ list_length(sortPathkeys),
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ cheapest_partial_path,
+ partial_unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ partial_unique_rel,
+ path,
+ cheapest_partial_path->pathtarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+
+ if (partial_unique_rel->partial_pathlist != NIL)
+ {
+ generate_useful_gather_paths(root, partial_unique_rel, true);
+ set_cheapest(partial_unique_rel);
+
+ /*
+ * Finally, create paths to unique-ify the final result. This step is
+ * needed to remove any duplicates due to combining rows from parallel
+ * workers.
+ */
+ create_final_unique_paths(root, partial_unique_rel,
+ sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+ }
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index eab44da65b8..28a4ae64440 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -929,11 +929,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
@@ -946,11 +946,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
}
}
@@ -970,11 +970,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
NULL);
/* and make the MergeAppend unique */
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(tlist),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(tlist),
+ dNumGroups);
add_path(result_rel, path);
}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 9cc602788ea..7cdad1db99f 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,7 +46,6 @@ typedef enum
*/
#define STD_FUZZ_FACTOR 1.01
-static List *translate_sub_tlist(List *tlist, int relid);
static int append_total_cost_compare(const ListCell *a, const ListCell *b);
static int append_startup_cost_compare(const ListCell *a, const ListCell *b);
static List *reparameterize_pathlist_by_child(PlannerInfo *root,
@@ -381,7 +380,6 @@ set_cheapest(RelOptInfo *parent_rel)
parent_rel->cheapest_startup_path = cheapest_startup_path;
parent_rel->cheapest_total_path = cheapest_total_path;
- parent_rel->cheapest_unique_path = NULL; /* computed only if needed */
parent_rel->cheapest_parameterized_paths = parameterized_paths;
}
@@ -1735,246 +1733,6 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * create_unique_path
- * Creates a path representing elimination of distinct rows from the
- * input data. Distinct-ness is defined according to the needs of the
- * semijoin represented by sjinfo. If it is not possible to identify
- * how to make the data unique, NULL is returned.
- *
- * If used at all, this is likely to be called repeatedly on the same rel;
- * and the input subpath should always be the same (the cheapest_total path
- * for the rel). So we cache the result.
- */
-UniquePath *
-create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- SpecialJoinInfo *sjinfo)
-{
- UniquePath *pathnode;
- Path sort_path; /* dummy for result of cost_sort */
- Path agg_path; /* dummy for result of cost_agg */
- MemoryContext oldcontext;
- int numCols;
-
- /* Caller made a mistake if subpath isn't cheapest_total ... */
- Assert(subpath == rel->cheapest_total_path);
- Assert(subpath->parent == rel);
- /* ... or if SpecialJoinInfo is the wrong one */
- Assert(sjinfo->jointype == JOIN_SEMI);
- Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
-
- /* If result already cached, return it */
- if (rel->cheapest_unique_path)
- return (UniquePath *) rel->cheapest_unique_path;
-
- /* If it's not possible to unique-ify, return NULL */
- if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
- return NULL;
-
- /*
- * When called during GEQO join planning, we are in a short-lived memory
- * context. We must make sure that the path and any subsidiary data
- * structures created for a baserel survive the GEQO cycle, else the
- * baserel is trashed for future GEQO cycles. On the other hand, when we
- * are creating those for a joinrel during GEQO, we don't want them to
- * clutter the main planning context. Upshot is that the best solution is
- * to explicitly allocate memory in the same context the given RelOptInfo
- * is in.
- */
- oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
-
- pathnode = makeNode(UniquePath);
-
- pathnode->path.pathtype = T_Unique;
- pathnode->path.parent = rel;
- pathnode->path.pathtarget = rel->reltarget;
- pathnode->path.param_info = subpath->param_info;
- pathnode->path.parallel_aware = false;
- pathnode->path.parallel_safe = rel->consider_parallel &&
- subpath->parallel_safe;
- pathnode->path.parallel_workers = subpath->parallel_workers;
-
- /*
- * Assume the output is unsorted, since we don't necessarily have pathkeys
- * to represent it. (This might get overridden below.)
- */
- pathnode->path.pathkeys = NIL;
-
- pathnode->subpath = subpath;
-
- /*
- * Under GEQO and when planning child joins, the sjinfo might be
- * short-lived, so we'd better make copies of data structures we extract
- * from it.
- */
- pathnode->in_operators = copyObject(sjinfo->semi_operators);
- pathnode->uniq_exprs = copyObject(sjinfo->semi_rhs_exprs);
-
- /*
- * If the input is a relation and it has a unique index that proves the
- * semi_rhs_exprs are unique, then we don't need to do anything. Note
- * that relation_has_unique_index_for automatically considers restriction
- * clauses for the rel, as well.
- */
- if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
- relation_has_unique_index_for(root, rel, NIL,
- sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
-
- /*
- * If the input is a subquery whose output must be unique already, then we
- * don't need to do anything. The test for uniqueness has to consider
- * exactly which columns we are extracting; for example "SELECT DISTINCT
- * x,y" doesn't guarantee that x alone is distinct. So we cannot check for
- * this optimization unless semi_rhs_exprs consists only of simple Vars
- * referencing subquery outputs. (Possibly we could do something with
- * expressions in the subquery outputs, too, but for now keep it simple.)
- */
- if (rel->rtekind == RTE_SUBQUERY)
- {
- RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
-
- if (query_supports_distinctness(rte->subquery))
- {
- List *sub_tlist_colnos;
-
- sub_tlist_colnos = translate_sub_tlist(sjinfo->semi_rhs_exprs,
- rel->relid);
-
- if (sub_tlist_colnos &&
- query_is_distinct_for(rte->subquery,
- sub_tlist_colnos,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
- }
- }
-
- /* Estimate number of output rows */
- pathnode->path.rows = estimate_num_groups(root,
- sjinfo->semi_rhs_exprs,
- rel->rows,
- NULL,
- NULL);
- numCols = list_length(sjinfo->semi_rhs_exprs);
-
- if (sjinfo->semi_can_btree)
- {
- /*
- * Estimate cost for sort+unique implementation
- */
- cost_sort(&sort_path, root, NIL,
- subpath->disabled_nodes,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width,
- 0.0,
- work_mem,
- -1.0);
-
- /*
- * Charge one cpu_operator_cost per comparison per input tuple. We
- * assume all columns get compared at most of the tuples. (XXX
- * probably this is an overestimate.) This should agree with
- * create_upper_unique_path.
- */
- sort_path.total_cost += cpu_operator_cost * rel->rows * numCols;
- }
-
- if (sjinfo->semi_can_hash)
- {
- /*
- * Estimate the overhead per hashtable entry at 64 bytes (same as in
- * planner.c).
- */
- int hashentrysize = subpath->pathtarget->width + 64;
-
- if (hashentrysize * pathnode->path.rows > get_hash_memory_limit())
- {
- /*
- * We should not try to hash. Hack the SpecialJoinInfo to
- * remember this, in case we come through here again.
- */
- sjinfo->semi_can_hash = false;
- }
- else
- cost_agg(&agg_path, root,
- AGG_HASHED, NULL,
- numCols, pathnode->path.rows,
- NIL,
- subpath->disabled_nodes,
- subpath->startup_cost,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width);
- }
-
- if (sjinfo->semi_can_btree && sjinfo->semi_can_hash)
- {
- if (agg_path.disabled_nodes < sort_path.disabled_nodes ||
- (agg_path.disabled_nodes == sort_path.disabled_nodes &&
- agg_path.total_cost < sort_path.total_cost))
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- pathnode->umethod = UNIQUE_PATH_SORT;
- }
- else if (sjinfo->semi_can_btree)
- pathnode->umethod = UNIQUE_PATH_SORT;
- else if (sjinfo->semi_can_hash)
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- {
- /* we can get here only if we abandoned hashing above */
- MemoryContextSwitchTo(oldcontext);
- return NULL;
- }
-
- if (pathnode->umethod == UNIQUE_PATH_HASH)
- {
- pathnode->path.disabled_nodes = agg_path.disabled_nodes;
- pathnode->path.startup_cost = agg_path.startup_cost;
- pathnode->path.total_cost = agg_path.total_cost;
- }
- else
- {
- pathnode->path.disabled_nodes = sort_path.disabled_nodes;
- pathnode->path.startup_cost = sort_path.startup_cost;
- pathnode->path.total_cost = sort_path.total_cost;
- }
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
-}
-
/*
* create_gather_merge_path
*
@@ -2026,36 +1784,6 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * translate_sub_tlist - get subquery column numbers represented by tlist
- *
- * The given targetlist usually contains only Vars referencing the given relid.
- * Extract their varattnos (ie, the column numbers of the subquery) and return
- * as an integer List.
- *
- * If any of the tlist items is not a simple Var, we cannot determine whether
- * the subquery's uniqueness condition (if any) matches ours, so punt and
- * return NIL.
- */
-static List *
-translate_sub_tlist(List *tlist, int relid)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, tlist)
- {
- Var *var = (Var *) lfirst(l);
-
- if (!var || !IsA(var, Var) ||
- var->varno != relid)
- return NIL; /* punt */
-
- result = lappend_int(result, var->varattno);
- }
- return result;
-}
-
/*
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
@@ -2813,8 +2541,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3069,8 +2796,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3117,8 +2843,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3194,13 +2919,10 @@ create_group_path(PlannerInfo *root,
}
/*
- * create_upper_unique_path
+ * create_unique_path
* Creates a pathnode that represents performing an explicit Unique step
* on presorted input.
*
- * This produces a Unique plan node, but the use-case is so different from
- * create_unique_path that it doesn't seem worth trying to merge the two.
- *
* 'rel' is the parent relation associated with the result
* 'subpath' is the path representing the source of data
* 'numCols' is the number of grouping columns
@@ -3209,21 +2931,20 @@ create_group_path(PlannerInfo *root,
* The input path must be sorted on the grouping columns, plus possibly
* additional columns; so the first numCols pathkeys are the grouping columns
*/
-UpperUniquePath *
-create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups)
+UniquePath *
+create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups)
{
- UpperUniquePath *pathnode = makeNode(UpperUniquePath);
+ UniquePath *pathnode = makeNode(UniquePath);
pathnode->path.pathtype = T_Unique;
pathnode->path.parent = rel;
/* Unique doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3279,8 +3000,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..0e523d2eb5b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -217,7 +217,6 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->partial_pathlist = NIL;
rel->cheapest_startup_path = NULL;
rel->cheapest_total_path = NULL;
- rel->cheapest_unique_path = NULL;
rel->cheapest_parameterized_paths = NIL;
rel->relid = relid;
rel->rtekind = rte->rtekind;
@@ -269,6 +268,9 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->fdw_private = NULL;
rel->unique_for_rels = NIL;
rel->non_unique_for_rels = NIL;
+ rel->unique_rel = NULL;
+ rel->unique_pathkeys = NIL;
+ rel->unique_groupclause = NIL;
rel->baserestrictinfo = NIL;
rel->baserestrictcost.startup = 0;
rel->baserestrictcost.per_tuple = 0;
@@ -713,7 +715,6 @@ build_join_rel(PlannerInfo *root,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
/* init direct_lateral_relids from children; we'll finish it up below */
joinrel->direct_lateral_relids =
@@ -748,6 +749,9 @@ build_join_rel(PlannerInfo *root,
joinrel->fdw_private = NULL;
joinrel->unique_for_rels = NIL;
joinrel->non_unique_for_rels = NIL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -906,7 +910,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
joinrel->direct_lateral_relids = NULL;
joinrel->lateral_relids = NULL;
@@ -933,6 +936,9 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->useridiscurrent = false;
joinrel->fdwroutine = NULL;
joinrel->fdw_private = NULL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -1488,7 +1494,6 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->pathlist = NIL;
upperrel->cheapest_startup_path = NULL;
upperrel->cheapest_total_path = NULL;
- upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fbe333d88fa..e97566b5938 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -319,8 +319,8 @@ typedef enum JoinType
* These codes are used internally in the planner, but are not supported
* by the executor (nor, indeed, by most of the planner).
*/
- JOIN_UNIQUE_OUTER, /* LHS path must be made unique */
- JOIN_UNIQUE_INNER, /* RHS path must be made unique */
+ JOIN_UNIQUE_OUTER, /* LHS has be made unique */
+ JOIN_UNIQUE_INNER, /* RHS has be made unique */
/*
* We might need additional join types someday.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595d..45f0b9c8ee9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -700,8 +700,6 @@ typedef struct PartitionSchemeData *PartitionScheme;
* (regardless of ordering) among the unparameterized paths;
* or if there is no unparameterized path, the path with lowest
* total cost among the paths with minimum parameterization
- * cheapest_unique_path - for caching cheapest path to produce unique
- * (no duplicates) output from relation; NULL if not yet requested
* cheapest_parameterized_paths - best paths for their parameterizations;
* always includes cheapest_total_path, even if that's unparameterized
* direct_lateral_relids - rels this rel has direct LATERAL references to
@@ -764,6 +762,21 @@ typedef struct PartitionSchemeData *PartitionScheme;
* other rels for which we have tried and failed to prove
* this one unique
*
+ * Three fields are used to cache information about unique-ification of this
+ * relation. This is used to support semijoins where the relation appears on
+ * the RHS: the relation is first unique-ified, and then a regular join is
+ * performed:
+ *
+ * unique_rel - the unique-ified version of the relation, containing paths
+ * that produce unique (no duplicates) output from relation;
+ * NULL if not yet requested
+ * unique_pathkeys - pathkeys that represent the ordering requirements for
+ * the relation's output in sort-based unique-ification
+ * implementations
+ * unique_groupclause - a list of SortGroupClause nodes that represent the
+ * columns to be grouped on in hash-based unique-ification
+ * implementations
+ *
* The presence of the following fields depends on the restrictions
* and joins that the relation participates in:
*
@@ -924,7 +937,6 @@ typedef struct RelOptInfo
List *partial_pathlist; /* partial Paths */
struct Path *cheapest_startup_path;
struct Path *cheapest_total_path;
- struct Path *cheapest_unique_path;
List *cheapest_parameterized_paths;
/*
@@ -1002,6 +1014,16 @@ typedef struct RelOptInfo
/* known not unique for these set(s) */
List *non_unique_for_rels;
+ /*
+ * information about unique-ification of this relation
+ */
+ /* the unique-ified version of the relation */
+ struct RelOptInfo *unique_rel;
+ /* pathkeys for sort-based unique-ification implementations */
+ List *unique_pathkeys;
+ /* SortGroupClause nodes for hash-based unique-ification implementations */
+ List *unique_groupclause;
+
/*
* used by various scans and joins:
*/
@@ -1739,8 +1761,8 @@ typedef struct ParamPathInfo
* and the specified outer rel(s).
*
* "rows" is the same as parent->rows in simple paths, but in parameterized
- * paths and UniquePaths it can be less than parent->rows, reflecting the
- * fact that we've filtered by extra join conditions or removed duplicates.
+ * paths it can be less than parent->rows, reflecting the fact that we've
+ * filtered by extra join conditions.
*
* "pathkeys" is a List of PathKey nodes (see above), describing the sort
* ordering of the path's output rows.
@@ -2137,34 +2159,6 @@ typedef struct MemoizePath
* if unknown */
} MemoizePath;
-/*
- * UniquePath represents elimination of distinct rows from the output of
- * its subpath.
- *
- * This can represent significantly different plans: either hash-based or
- * sort-based implementation, or a no-op if the input path can be proven
- * distinct already. The decision is sufficiently localized that it's not
- * worth having separate Path node types. (Note: in the no-op case, we could
- * eliminate the UniquePath node entirely and just return the subpath; but
- * it's convenient to have a UniquePath in the path tree to signal upper-level
- * routines that the input is known distinct.)
- */
-typedef enum UniquePathMethod
-{
- UNIQUE_PATH_NOOP, /* input is known unique already */
- UNIQUE_PATH_HASH, /* use hashing */
- UNIQUE_PATH_SORT, /* use sorting */
-} UniquePathMethod;
-
-typedef struct UniquePath
-{
- Path path;
- Path *subpath;
- UniquePathMethod umethod;
- List *in_operators; /* equality operators of the IN clause */
- List *uniq_exprs; /* expressions to be made unique */
-} UniquePath;
-
/*
* GatherPath runs several copies of a plan in parallel and collects the
* results. The parallel leader may also execute the plan, unless the
@@ -2371,17 +2365,17 @@ typedef struct GroupPath
} GroupPath;
/*
- * UpperUniquePath represents adjacent-duplicate removal (in presorted input)
+ * UniquePath represents adjacent-duplicate removal (in presorted input)
*
* The columns to be compared are the first numkeys columns of the path's
* pathkeys. The input is presumed already sorted that way.
*/
-typedef struct UpperUniquePath
+typedef struct UniquePath
{
Path path;
Path *subpath; /* path representing input source */
int numkeys; /* number of pathkey columns to compare */
-} UpperUniquePath;
+} UniquePath;
/*
* AggPath represents generic computation of aggregate functions
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 60dcdb77e41..71d2945b175 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -91,8 +91,6 @@ extern MemoizePath *create_memoize_path(PlannerInfo *root,
bool singlerow,
bool binary_mode,
double calls);
-extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
- Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath, PathTarget *target,
Relids required_outer, double *rows);
@@ -223,11 +221,11 @@ extern GroupPath *create_group_path(PlannerInfo *root,
List *groupClause,
List *qual,
double numGroups);
-extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups);
+extern UniquePath *create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups);
extern AggPath *create_agg_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 347c582a789..f220e9a270d 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,4 +59,7 @@ extern Path *get_cheapest_fractional_path(RelOptInfo *rel,
extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
+extern RelOptInfo *create_unique_paths(PlannerInfo *root, RelOptInfo *rel,
+ SpecialJoinInfo *sjinfo);
+
#endif /* PLANNER_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 46ddfa844c5..6bfea78dbaf 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -9468,23 +9468,20 @@ where exists (select 1 from tenk1 t3
---------------------------------------------------------------------------------
Nested Loop
Output: t1.unique1, t2.hundred
- -> Hash Join
+ -> Merge Join
Output: t1.unique1, t3.tenthous
- Hash Cond: (t3.thousand = t1.unique1)
- -> HashAggregate
+ Merge Cond: (t3.thousand = t1.unique1)
+ -> Unique
Output: t3.thousand, t3.tenthous
- Group Key: t3.thousand, t3.tenthous
-> Index Only Scan using tenk1_thous_tenthous on public.tenk1 t3
Output: t3.thousand, t3.tenthous
- -> Hash
+ -> Index Only Scan using onek_unique1 on public.onek t1
Output: t1.unique1
- -> Index Only Scan using onek_unique1 on public.onek t1
- Output: t1.unique1
- Index Cond: (t1.unique1 < 1)
+ Index Cond: (t1.unique1 < 1)
-> Index Only Scan using tenk1_hundred on public.tenk1 t2
Output: t2.hundred
Index Cond: (t2.hundred = t3.tenthous)
-(18 rows)
+(15 rows)
-- ... unless it actually is unique
create table j3 as select unique1, tenthous from onek;
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index d5368186caa..24e06845f92 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1134,48 +1134,50 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- Join Filter: (t1_2.a = t1_5.b)
- -> HashAggregate
- Group Key: t1_5.b
+ -> Nested Loop
+ Join Filter: (t1_2.a = t1_5.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_5.b
-> Hash Join
Hash Cond: (((t2_1.a + t2_1.b) / 2) = t1_5.b)
-> Seq Scan on prt1_e_p1 t2_1
-> Hash
-> Seq Scan on prt2_p1 t1_5
Filter: (a = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
- Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_3.a = t1_6.b)
- -> HashAggregate
- Group Key: t1_6.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
+ Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_3.a = t1_6.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Join
Hash Cond: (((t2_2.a + t2_2.b) / 2) = t1_6.b)
-> Seq Scan on prt1_e_p2 t2_2
-> Hash
-> Seq Scan on prt2_p2 t1_6
Filter: (a = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
- Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_4.a = t1_7.b)
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
+ Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_4.a = t1_7.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Nested Loop
-> Seq Scan on prt2_p3 t1_7
Filter: (a = 0)
-> Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t2_3
Index Cond: (((a + b) / 2) = t1_7.b)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
- Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
- Filter: (b = 0)
-(41 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
+ Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
+ Filter: (b = 0)
+(43 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
a | b | c
@@ -1190,46 +1192,48 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_6.b
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Semi Join
Hash Cond: (t1_6.b = ((t1_9.a + t1_9.b) / 2))
-> Seq Scan on prt2_p1 t1_6
-> Hash
-> Seq Scan on prt1_e_p1 t1_9
Filter: (c = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
- Index Cond: (a = t1_6.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
+ Index Cond: (a = t1_6.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Hash Semi Join
Hash Cond: (t1_7.b = ((t1_10.a + t1_10.b) / 2))
-> Seq Scan on prt2_p2 t1_7
-> Hash
-> Seq Scan on prt1_e_p2 t1_10
Filter: (c = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
- Index Cond: (a = t1_7.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_8.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
+ Index Cond: (a = t1_7.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_8.b
-> Hash Semi Join
Hash Cond: (t1_8.b = ((t1_11.a + t1_11.b) / 2))
-> Seq Scan on prt2_p3 t1_8
-> Hash
-> Seq Scan on prt1_e_p3 t1_11
Filter: (c = 0)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
- Index Cond: (a = t1_8.b)
- Filter: (b = 0)
-(39 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
+ Index Cond: (a = t1_8.b)
+ Filter: (b = 0)
+(41 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 18fed63e738..0563d0cd5a1 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -707,6 +707,212 @@ select * from numeric_table
3
(4 rows)
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Index Cond: (t2.a = t3.b)
+(18 rows)
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Incremental Sort
+ Output: t1.a, t1.b, t2.a, t2.b
+ Sort Key: t1.a, t2.a
+ Presorted Key: t1.a
+ -> Merge Join
+ Output: t1.a, t1.b, t2.a, t2.b
+ Merge Cond: (t1.a = ((t3.a + 1)))
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Sort
+ Output: t2.a, t2.b, t3.a, ((t3.a + 1))
+ Sort Key: ((t3.a + 1))
+ -> Hash Join
+ Output: t2.a, t2.b, t3.a, (t3.a + 1)
+ Hash Cond: (t2.a = (t3.b + 1))
+ -> Seq Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ -> Hash
+ Output: t3.a, t3.b
+ -> HashAggregate
+ Output: t3.a, t3.b
+ Group Key: (t3.a + 1), (t3.b + 1)
+ -> Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b, (t3.a + 1), (t3.b + 1)
+(24 rows)
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+set enable_indexscan to off;
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Gather Merge
+ Output: t3.a, t3.b
+ Workers Planned: 2
+ -> Sort
+ Output: t3.a, t3.b
+ Sort Key: t3.a, t3.b
+ -> HashAggregate
+ Output: t3.a, t3.b
+ Group Key: t3.a, t3.b
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Materialize
+ Output: t1.a, t1.b
+ -> Gather Merge
+ Output: t1.a, t1.b
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, t1.b
+ Sort Key: t1.a
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Bitmap Heap Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Recheck Cond: (t2.a = t3.b)
+ -> Bitmap Index Scan on semijoin_unique_tbl_a_b_idx
+ Index Cond: (t2.a = t3.b)
+(37 rows)
+
+reset enable_indexscan;
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+drop table semijoin_unique_tbl;
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+set enable_partitionwise_join to on;
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: t1.a
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t2_1.a, t2_1.b
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t3_1.a
+ -> Unique
+ Output: t3_1.a
+ -> Index Only Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t3_1
+ Output: t3_1.a
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t1_1
+ Output: t1_1.a, t1_1.b
+ Index Cond: (t1_1.a = t3_1.a)
+ -> Memoize
+ Output: t2_1.a, t2_1.b
+ Cache Key: t1_1.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t2_1
+ Output: t2_1.a, t2_1.b
+ Index Cond: (t2_1.a = t1_1.a)
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t2_2.a, t2_2.b
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t3_2.a
+ -> Unique
+ Output: t3_2.a
+ -> Index Only Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t3_2
+ Output: t3_2.a
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t1_2
+ Output: t1_2.a, t1_2.b
+ Index Cond: (t1_2.a = t3_2.a)
+ -> Memoize
+ Output: t2_2.a, t2_2.b
+ Cache Key: t1_2.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t2_2
+ Output: t2_2.a, t2_2.b
+ Index Cond: (t2_2.a = t1_2.a)
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t2_3.a, t2_3.b
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t3_3.a
+ -> Unique
+ Output: t3_3.a
+ -> Sort
+ Output: t3_3.a
+ Sort Key: t3_3.a
+ -> Seq Scan on public.unique_tbl_p3 t3_3
+ Output: t3_3.a
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t1_3
+ Output: t1_3.a, t1_3.b
+ Index Cond: (t1_3.a = t3_3.a)
+ -> Memoize
+ Output: t2_3.a, t2_3.b
+ Cache Key: t1_3.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t2_3
+ Output: t2_3.a, t2_3.b
+ Index Cond: (t2_3.a = t1_3.a)
+(59 rows)
+
+reset enable_partitionwise_join;
+drop table unique_tbl_p;
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
@@ -2672,18 +2878,17 @@ EXPLAIN (COSTS OFF)
SELECT * FROM onek
WHERE (unique1,ten) IN (VALUES (1,1), (20,0), (99,9), (17,99))
ORDER BY unique1;
- QUERY PLAN
------------------------------------------------------------------
- Sort
- Sort Key: onek.unique1
- -> Nested Loop
- -> HashAggregate
- Group Key: "*VALUES*".column1, "*VALUES*".column2
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: "*VALUES*".column1, "*VALUES*".column2
-> Values Scan on "*VALUES*"
- -> Index Scan using onek_unique1 on onek
- Index Cond: (unique1 = "*VALUES*".column1)
- Filter: ("*VALUES*".column2 = ten)
-(9 rows)
+ -> Index Scan using onek_unique1 on onek
+ Index Cond: (unique1 = "*VALUES*".column1)
+ Filter: ("*VALUES*".column2 = ten)
+(8 rows)
EXPLAIN (COSTS OFF)
SELECT * FROM onek
@@ -2858,12 +3063,10 @@ SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) ORDER BY 1);
-> Unique
-> Sort
Sort Key: "*VALUES*".column1
- -> Sort
- Sort Key: "*VALUES*".column1
- -> Values Scan on "*VALUES*"
+ -> Values Scan on "*VALUES*"
-> Index Scan using onek_unique1 on onek
Index Cond: (unique1 = "*VALUES*".column1)
-(9 rows)
+(7 rows)
EXPLAIN (COSTS OFF)
SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) LIMIT 1);
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index d9a841fbc9f..a6d276a115b 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -361,6 +361,73 @@ select * from float_table
select * from numeric_table
where num_col in (select float_col from float_table);
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+
+set enable_indexscan to off;
+
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+reset enable_indexscan;
+
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+
+drop table semijoin_unique_tbl;
+
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+
+set enable_partitionwise_join to on;
+
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+
+reset enable_partitionwise_join;
+
+drop table unique_tbl_p;
+
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ff050e93a50..62eb4f332c5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3155,7 +3155,6 @@ UnicodeNormalizationForm
UnicodeNormalizationQC
Unique
UniquePath
-UniquePathMethod
UniqueRelInfo
UniqueState
UnlistenStmt
@@ -3171,7 +3170,6 @@ UpgradeTaskSlotState
UpgradeTaskStep
UploadManifestCmd
UpperRelationKind
-UpperUniquePath
UserAuth
UserContext
UserMapping
--
2.43.0
Hello,
As a very trivial test on this patch, I ran the query in your opening
email, both with and without the patch, scaling up the size of the table
a little bit. So I did this
drop table if exists t;
create table t(a int, b int);
insert into t select i % 100000, i from generate_series(1,1e7) i;
create index on t(a);
vacuum analyze t;
set enable_hashagg to off;
explain (costs off, analyze, buffers)
select * from t t1 where t1.a in
(select a from t t2 where a < 10000)
order by t1.a;
This is the plan without the patch:
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Merge Join (actual time=289.262..700.761 rows=1000000.00 loops=1)
Merge Cond: (t1.a = t2.a)
Buffers: shared hit=1017728 read=3945 written=3361, temp read=1471 written=1476
-> Index Scan using t_a_idx on t t1 (actual time=0.011..320.747 rows=1000001.00 loops=1)
Index Searches: 1
Buffers: shared hit=997725 read=3112 written=2664
-> Sort (actual time=219.273..219.771 rows=10000.00 loops=1)
Sort Key: t2.a
Sort Method: quicksort Memory: 385kB
Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476
-> Unique (actual time=128.173..218.708 rows=10000.00 loops=1)
Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476
-> Sort (actual time=128.170..185.461 rows=1000000.00 loops=1)
Sort Key: t2.a
Sort Method: external merge Disk: 11768kB
Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476
-> Index Only Scan using t_a_idx on t t2 (actual time=0.024..74.171 rows=1000000.00 loops=1)
Index Cond: (a < 10000)
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=20003 read=833 written=697
Planning:
Buffers: shared hit=28 read=7
Planning Time: 0.212 ms
Execution Time: 732.840 ms
and this is the plan with the patch:
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────
Merge Join (actual time=70.310..595.116 rows=1000000.00 loops=1)
Merge Cond: (t1.a = t2.a)
Buffers: shared hit=1017750 read=3923 written=3586
-> Index Scan using t_a_idx on t t1 (actual time=0.020..341.257 rows=1000001.00 loops=1)
Index Searches: 1
Buffers: shared hit=996914 read=3923 written=3586
-> Unique (actual time=0.028..99.074 rows=10000.00 loops=1)
Buffers: shared hit=20836
-> Index Only Scan using t_a_idx on t t2 (actual time=0.026..66.219 rows=1000000.00 loops=1)
Index Cond: (a < 10000)
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=20836
Planning:
Buffers: shared hit=55 read=15 written=14
Planning Time: 0.391 ms
Execution Time: 621.377 ms
This is a really nice improvement. I think we could find queries that
are arbitrarily faster, by feeding enough tuples to the unnecessary Sort
nodes.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"No necesitamos banderas
No reconocemos fronteras" (Jorge González)
On Wed, Jul 23, 2025 at 5:11 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
As a very trivial test on this patch, I ran the query in your opening
email, both with and without the patch, scaling up the size of the table
a little bit.
This is a really nice improvement. I think we could find queries that
are arbitrarily faster, by feeding enough tuples to the unnecessary Sort
nodes.
Thank you for testing this patch!
In addition to eliminating unnecessary sort nodes, this patch also
allows us to exploit pre-sorted paths that aren't necessarily the
cheapest total during the unique-ification step. It also allows the
use of parallel plans for that step on large tables. I think we could
also find queries that become faster as a result of these improvements.
Thanks
Richard
Hi Richard,
Thanks for the patch! I applied your patch and played around with it.
I have a question about the following test case you added in
subselect.sql:
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Incremental Sort
+ Output: t1.a, t1.b, t2.a, t2.b
+ Sort Key: t1.a, t2.a
+ Presorted Key: t1.a
+ -> Merge Join
+ Output: t1.a, t1.b, t2.a, t2.b
+ Merge Cond: (t1.a = ((t3.a + 1)))
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on
public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Sort
+ Output: t2.a, t2.b, t3.a, ((t3.a + 1))
+ Sort Key: ((t3.a + 1))
+ -> Hash Join
+ Output: t2.a, t2.b, t3.a, (t3.a + 1)
+ Hash Cond: (t2.a = (t3.b + 1))
+ -> Seq Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ -> Hash
+ Output: t3.a, t3.b
+ -> HashAggregate
+ Output: t3.a, t3.b
+ Group Key: (t3.a + 1), (t3.b + 1)
+ -> Seq Scan on
public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b, (t3.a + 1),
(t3.b + 1)
+(24 rows)
I was under the impression that we wanted Unique on top of a sorted
node for the inner of the SIMI join. However, the expected output uses
a HashAgg instead. Is this expected?
While looking at the code, I also had a question about the following
changes in costsize.c:
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3963,7 +3963,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath
*path,
* The whole issue is moot if we are working from a unique-ified outer
* input, or if we know we don't need to mark/restore at all.
*/
- if (IsA(outer_path, UniquePath) || path->skip_mark_restore)
+ if (IsA(outer_path, UniquePath) ||
+ IsA(outer_path, AggPath) ||
+ path->skip_mark_restore)
and
@@ -4358,7 +4360,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
* because we avoid contaminating the cache with a value that's wrong for
* non-unique-ified paths.
*/
- if (IsA(inner_path, UniquePath))
+ if (IsA(inner_path, UniquePath) || IsA(inner_path, AggPath))
I'm curious why AggPath was added in these two cases. To investigate,
I reverted these two changes regarding AggPath, and reran make
installcheck, and got the following diff:
diff -U3
/Users/alex.wang/workspace/postgres/src/test/regress/expected/subselect.out
/Users/alex.wang/workspace/postgres/src/test/regress/results/subselect.out
---
/Users/alex.wang/workspace/postgres/src/test/regress/expected/subselect.out
2025-07-30 14:47:02
+++
/Users/alex.wang/workspace/postgres/src/test/regress/results/subselect.out
2025-07-30 17:35:33
@@ -747,33 +747,32 @@
select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
order by t1.a, t2.a;
- QUERY PLAN
-------------------------------------------------------------------------------------------------
- Incremental Sort
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Merge Join
Output: t1.a, t1.b, t2.a, t2.b
- Sort Key: t1.a, t2.a
- Presorted Key: t1.a
- -> Merge Join
- Output: t1.a, t1.b, t2.a, t2.b
- Merge Cond: (t1.a = ((t3.a + 1)))
+ Merge Cond: ((t3.a + 1) = t1.a)
+ -> Nested Loop
+ Output: t2.a, t2.b, t3.a
+ -> Unique
+ Output: t3.a, t3.b, ((t3.a + 1)), ((t3.b + 1))
+ -> Sort
+ Output: t3.a, t3.b, ((t3.a + 1)), ((t3.b + 1))
+ Sort Key: ((t3.a + 1)), ((t3.b + 1))
+ -> Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b, (t3.a + 1), (t3.b + 1)
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: (t3.b + 1)
+ Cache Mode: logical
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on
public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Index Cond: (t2.a = (t3.b + 1))
+ -> Materialize
+ Output: t1.a, t1.b
-> Index Only Scan using semijoin_unique_tbl_a_b_idx on
public.semijoin_unique_tbl t1
Output: t1.a, t1.b
- -> Sort
- Output: t2.a, t2.b, t3.a, ((t3.a + 1))
- Sort Key: ((t3.a + 1))
- -> Hash Join
- Output: t2.a, t2.b, t3.a, (t3.a + 1)
- Hash Cond: (t2.a = (t3.b + 1))
- -> Seq Scan on public.semijoin_unique_tbl t2
- Output: t2.a, t2.b
- -> Hash
- Output: t3.a, t3.b
- -> HashAggregate
- Output: t3.a, t3.b
- Group Key: (t3.a + 1), (t3.b + 1)
- -> Seq Scan on
public.semijoin_unique_tbl t3
- Output: t3.a, t3.b, (t3.a + 1),
(t3.b + 1)
-(24 rows)
+(23 rows)
-- encourage use of parallel plans
set parallel_setup_cost=0;
@@ -2818,21 +2817,23 @@
SELECT * FROM tenk1 A INNER JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd)
WHERE a.thousand < 750;
- QUERY PLAN
--------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------
Hash Join
Hash Cond: (c.odd = b.odd)
- -> Hash Join
- Hash Cond: (a.hundred = c.hundred)
- -> Seq Scan on tenk1 a
- Filter: (thousand < 750)
- -> Hash
- -> HashAggregate
- Group Key: c.odd, c.hundred
- -> Seq Scan on tenk2 c
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: c.odd, c.hundred
+ -> Seq Scan on tenk2 c
+ -> Memoize
+ Cache Key: c.hundred
+ Cache Mode: logical
+ -> Index Scan using tenk1_hundred on tenk1 a
+ Index Cond: (hundred = c.hundred)
+ Filter: (thousand < 750)
-> Hash
-> Seq Scan on tenk2 b
-(12 rows)
+(14 rows)
The second diff looks fine to me. However, the first diff in the
subselect test happened to be the test that I asked about.
Here's the comparison of the two EXPLAIN ANALYZE results:
-- setup:
create table semijoin_unique_tbl (a int, b int);
insert into semijoin_unique_tbl select i%10, i%10 from
generate_series(1,1000)i;
create index on semijoin_unique_tbl(a, b);
analyze semijoin_unique_tbl;
-- query:
EXPLAIN ANALYZE
select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
order by t1.a, t2.a;
-- results:
Output of your original v5 patch:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Incremental Sort (cost=93.94..297.51 rows=2500 width=16) (actual
time=4.153..25.257 rows=90000.00 loops=1)
Sort Key: t1.a, t2.a
Presorted Key: t1.a
Full-sort Groups: 9 Sort Method: quicksort Average Memory: 27kB Peak
Memory: 27kB
Pre-sorted Groups: 9 Sort Method: quicksort Average Memory: 697kB
Peak Memory: 697kB
Buffers: shared hit=61
-> Merge Join (cost=74.81..166.49 rows=2500 width=16) (actual
time=0.964..13.341 rows=90000.00 loops=1)
Merge Cond: (t1.a = ((t3.a + 1)))
Buffers: shared hit=61
-> Index Only Scan using semijoin_unique_tbl_a_b_idx on
semijoin_unique_tbl t1 (cost=0.15..43.08 rows=1000 width=8) (actual
time=0.040..0.276 rows=1000.00 loops=1)
Heap Fetches: 1000
Index Searches: 1
Buffers: shared hit=51
-> Sort (cost=74.66..75.91 rows=500 width=12) (actual
time=0.867..4.366 rows=89901.00 loops=1)
Sort Key: ((t3.a + 1))
Sort Method: quicksort Memory: 53kB
Buffers: shared hit=10
-> Hash Join (cost=27.25..52.25 rows=500 width=12) (actual
time=0.401..0.711 rows=900.00 loops=1)
Hash Cond: (t2.a = (t3.b + 1))
Buffers: shared hit=10
-> Seq Scan on semijoin_unique_tbl t2
(cost=0.00..15.00 rows=1000 width=8) (actual time=0.027..0.106
rows=1000.00 loops=1)
Buffers: shared hit=5
-> Hash (cost=26.00..26.00 rows=100 width=8) (actual
time=0.361..0.361 rows=10.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=5
-> HashAggregate (cost=25.00..26.00 rows=100
width=8) (actual time=0.349..0.352 rows=10.00 loops=1)
Group Key: (t3.a + 1), (t3.b + 1)
Batches: 1 Memory Usage: 32kB
Buffers: shared hit=5
-> Seq Scan on semijoin_unique_tbl t3
(cost=0.00..20.00 rows=1000 width=16) (actual time=0.012..0.150
rows=1000.00 loops=1)
Buffers: shared hit=5
Planning Time: 0.309 ms
Execution Time: 28.552 ms
(33 rows)
Output of the v5 patch + my modification that removes the changes in
costsize.c about AggPath:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=70.14..316.44 rows=2500 width=16) (actual
time=0.862..13.484 rows=90000.00 loops=1)
Merge Cond: ((t3.a + 1) = t1.a)
Buffers: shared hit=105 read=6
-> Nested Loop (cost=69.99..227.12 rows=500 width=12) (actual
time=0.778..1.225 rows=900.00 loops=1)
Buffers: shared hit=54 read=6
-> Unique (cost=69.83..77.33 rows=100 width=16) (actual
time=0.685..0.782 rows=10.00 loops=1)
Buffers: shared read=5
-> Sort (cost=69.83..72.33 rows=1000 width=16) (actual
time=0.684..0.723 rows=1000.00 loops=1)
Sort Key: ((t3.a + 1)), ((t3.b + 1))
Sort Method: quicksort Memory: 56kB
Buffers: shared read=5
-> Seq Scan on semijoin_unique_tbl t3
(cost=0.00..20.00 rows=1000 width=16) (actual time=0.324..0.479
rows=1000.00 loops=1)
Buffers: shared read=5
-> Memoize (cost=0.16..2.19 rows=100 width=8) (actual
time=0.010..0.035 rows=90.00 loops=10)
Cache Key: (t3.b + 1)
Cache Mode: logical
Hits: 0 Misses: 10 Evictions: 0 Overflows: 0 Memory
Usage: 36kB
Buffers: shared hit=54 read=1
-> Index Only Scan using semijoin_unique_tbl_a_b_idx on
semijoin_unique_tbl t2 (cost=0.15..2.18 rows=100 width=8) (actual
time=0.005..0.015 rows=90.00 loops=10)
Index Cond: (a = (t3.b + 1))
Heap Fetches: 900
Index Searches: 10
Buffers: shared hit=54 read=1
-> Materialize (cost=0.15..45.58 rows=1000 width=8) (actual
time=0.014..3.613 rows=90001.00 loops=1)
Storage: Memory Maximum Storage: 20kB
Buffers: shared hit=51
-> Index Only Scan using semijoin_unique_tbl_a_b_idx on
semijoin_unique_tbl t1 (cost=0.15..43.08 rows=1000 width=8) (actual
time=0.010..0.126 rows=1000.00 loops=1)
Heap Fetches: 1000
Index Searches: 1
Buffers: shared hit=51
Planning:
Buffers: shared hit=69 read=20
Planning Time: 3.558 ms
Execution Time: 17.211 ms
(34 rows)
While both runs are fast (28ms vs. 17ms), the plan generated by the v5
patch is slower in this case.
The latter plan also seems closer to my expectation: Unique + Sort for
the inner side of the SEMI join.
What do you think?
Best,
Alex
On Thu, Jul 31, 2025 at 10:33 AM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:
Thanks for the patch! I applied your patch and played around with it.
Thanks for trying out this patch.
I have a question about the following test case you added in
subselect.sql:
I was under the impression that we wanted Unique on top of a sorted
node for the inner of the SIMI join. However, the expected output uses
a HashAgg instead. Is this expected?
Hmm, I don't think we have such expectation that "Sort+Unique" should
be used for the unique-ification step in this query. This patch
considers both hash-based and sort-based implementations, and lets the
cost comparison determine which one is preferable. So I wouldn't be
surprised if the planner ends up choosing the hash-based
implementation in the final plan.
While looking at the code, I also had a question about the following
changes in costsize.c:--- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -3963,7 +3963,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path, * The whole issue is moot if we are working from a unique-ified outer * input, or if we know we don't need to mark/restore at all. */ - if (IsA(outer_path, UniquePath) || path->skip_mark_restore) + if (IsA(outer_path, UniquePath) || + IsA(outer_path, AggPath) || + path->skip_mark_restore)and
@@ -4358,7 +4360,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path, * because we avoid contaminating the cache with a value that's wrong for * non-unique-ified paths. */ - if (IsA(inner_path, UniquePath)) + if (IsA(inner_path, UniquePath) || IsA(inner_path, AggPath))I'm curious why AggPath was added in these two cases.
Well, in final_cost_hashjoin() and final_cost_mergejoin(), we have
some special cases when the inner or outer relation has been
unique-ified. Previously, it was sufficient to check whether the path
was a UniquePath, since both hash-based and sort-based implementations
were represented that way. However, with this patch, UniquePath now
only represents the sort-based implementation, so we also need to
check for AggPath to account for the hash-based case.
The second diff looks fine to me. However, the first diff in the
subselect test happened to be the test that I asked about.Here's the comparison of the two EXPLAIN ANALYZE results:
While both runs are fast (28ms vs. 17ms), the plan generated by the v5
patch is slower in this case.
This is a very interesting observation. In fact, with the original v5
patch, you can produce both plans by setting enable_hashagg on and
off.
set enable_hashagg to on;
Incremental Sort (cost=91.95..277.59 rows=2500 width=16)
(actual time=31.960..147.040 rows=90000.00 loops=1)
set enable_hashagg to off;
Merge Join (cost=70.14..294.34 rows=2500 width=16)
(actual time=4.303..71.891 rows=90000.00 loops=1)
This seems to be another case where the planner chooses a suboptimal
plan due to inaccurate cost estimates.
Thanks
Richard
On Thu, Jul 31, 2025 at 1:08 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Jul 31, 2025 at 10:33 AM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:While looking at the code, I also had a question about the following
changes in costsize.c:--- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -3963,7 +3963,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path, * The whole issue is moot if we are working from a unique-ified outer * input, or if we know we don't need to mark/restore at all. */ - if (IsA(outer_path, UniquePath) || path->skip_mark_restore) + if (IsA(outer_path, UniquePath) || + IsA(outer_path, AggPath) || + path->skip_mark_restore)and
@@ -4358,7 +4360,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path, * because we avoid contaminating the cache with a value that's wrong for * non-unique-ified paths. */ - if (IsA(inner_path, UniquePath)) + if (IsA(inner_path, UniquePath) || IsA(inner_path, AggPath))I'm curious why AggPath was added in these two cases.
Well, in final_cost_hashjoin() and final_cost_mergejoin(), we have
some special cases when the inner or outer relation has been
unique-ified. Previously, it was sufficient to check whether the path
was a UniquePath, since both hash-based and sort-based implementations
were represented that way. However, with this patch, UniquePath now
only represents the sort-based implementation, so we also need to
check for AggPath to account for the hash-based case.
BTW, maybe a better way to determine whether a relation has been
unique-ified is to check that the nominal jointype is JOIN_INNER and
sjinfo->jointype is JOIN_SEMI, and the relation is exactly the RHS of
the semijoin. This approach is mentioned in a comment in joinpath.c:
* Path cost estimation code may need to recognize that it's
* dealing with such a case --- the combination of nominal jointype INNER
* with sjinfo->jointype == JOIN_SEMI indicates that.
... but it seems we don't currently apply it in costsize.c.
To be concrete, I'm imagining a check like the following:
#define IS_UNIQUEIFIED_REL(rel, sjinfo, nominal_jointype) \
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
... and then the check in final_cost_hashjoin() becomes:
if (IS_UNIQUEIFIED_REL(inner_path->parent, extra->sjinfo,
path->jpath.jointype))
{
innerbucketsize = 1.0 / virtualbuckets;
innermcvfreq = 0.0;
}
Would this be a better approach? Any thoughts?
Thanks
Richard
HI
This is a very interesting observation. In fact, with the original v5
patch, you can produce both plans by setting enable_hashagg on and
off.
set enable_hashagg to on;
Incremental Sort (cost=91.95..277.59 rows=2500 width=16)
(actual time=31.960..147.040 rows=90000.00 loops=1)set enable_hashagg to off;
Merge Join (cost=70.14..294.34 rows=2500 width=16)
(actual time=4.303..71.891 rows=90000.00 loops=1)This seems to be another case where the planner chooses a suboptimal
plan due to inaccurate cost estimates.
Agree ,I increased some rows , set enable_hashagg to on and off ,There's no
difference in execution time. The execution plan is the same.
#define IS_UNIQUEIFIED_REL(rel, sjinfo, nominal_jointype) \
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI
&& \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
... and then the check in final_cost_hashjoin() becomes:
if (IS_UNIQUEIFIED_REL(inner_path->parent, extra->sjinfo,
path->jpath.jointype))
{
innerbucketsize = 1.0 / virtualbuckets;
innermcvfreq = 0.0;
}Would this be a better approach? Any thoughts?
This approach does indeed more accurately capture the fact that the
relation has been unique-ified, especially in cases where a semi join has
been transformed into an inner join. Compared to the current heuristic
checks in costsize.c that rely on inner_path->rows, this method is more
semantically meaningful and robust.
On Thu, Jul 31, 2025 at 4:58 PM Richard Guo <guofenglinux@gmail.com> wrote:
Show quoted text
On Thu, Jul 31, 2025 at 1:08 PM Richard Guo <guofenglinux@gmail.com>
wrote:On Thu, Jul 31, 2025 at 10:33 AM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:While looking at the code, I also had a question about the following
changes in costsize.c:--- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -3963,7 +3963,9 @@ final_cost_mergejoin(PlannerInfo *root,MergePath *path,
* The whole issue is moot if we are working from a unique-ified outer * input, or if we know we don't need to mark/restore at all. */ - if (IsA(outer_path, UniquePath) || path->skip_mark_restore) + if (IsA(outer_path, UniquePath) || + IsA(outer_path, AggPath) || + path->skip_mark_restore)and
@@ -4358,7 +4360,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath
*path,
* because we avoid contaminating the cache with a value that's wrong
for
* non-unique-ified paths. */ - if (IsA(inner_path, UniquePath)) + if (IsA(inner_path, UniquePath) || IsA(inner_path, AggPath))I'm curious why AggPath was added in these two cases.
Well, in final_cost_hashjoin() and final_cost_mergejoin(), we have
some special cases when the inner or outer relation has been
unique-ified. Previously, it was sufficient to check whether the path
was a UniquePath, since both hash-based and sort-based implementations
were represented that way. However, with this patch, UniquePath now
only represents the sort-based implementation, so we also need to
check for AggPath to account for the hash-based case.BTW, maybe a better way to determine whether a relation has been
unique-ified is to check that the nominal jointype is JOIN_INNER and
sjinfo->jointype is JOIN_SEMI, and the relation is exactly the RHS of
the semijoin. This approach is mentioned in a comment in joinpath.c:* Path cost estimation code may need to recognize that it's
* dealing with such a case --- the combination of nominal jointype INNER
* with sjinfo->jointype == JOIN_SEMI indicates that.... but it seems we don't currently apply it in costsize.c.
To be concrete, I'm imagining a check like the following:
#define IS_UNIQUEIFIED_REL(rel, sjinfo, nominal_jointype) \
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI
&& \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))... and then the check in final_cost_hashjoin() becomes:
if (IS_UNIQUEIFIED_REL(inner_path->parent, extra->sjinfo,
path->jpath.jointype))
{
innerbucketsize = 1.0 / virtualbuckets;
innermcvfreq = 0.0;
}Would this be a better approach? Any thoughts?
Thanks
Richard
On Thu, Jul 31, 2025 at 9:49 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
This seems to be another case where the planner chooses a suboptimal
plan due to inaccurate cost estimates.
Agree ,I increased some rows , set enable_hashagg to on and off ,There's no difference in execution time. The execution plan is the same.
Yeah, I found that if you increase the total number of rows in the
table from 1000 to 1083 or more, you consistently get the more
efficient plan -- regardless of whether enable_hashagg is on or off.
#define IS_UNIQUEIFIED_REL(rel, sjinfo, nominal_jointype) \
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))... and then the check in final_cost_hashjoin() becomes:
if (IS_UNIQUEIFIED_REL(inner_path->parent, extra->sjinfo,
path->jpath.jointype))
{
innerbucketsize = 1.0 / virtualbuckets;
innermcvfreq = 0.0;
}Would this be a better approach? Any thoughts?
This approach does indeed more accurately capture the fact that the relation has been unique-ified, especially in cases where a semi join has been transformed into an inner join. Compared to the current heuristic checks in costsize.c that rely on inner_path->rows, this method is more semantically meaningful and robust.
The current check in costsize.c relies on the path type, not the path
rows as you mentioned. However, I agree that the check I proposed is
more robust and extensible: if additional path types are introduced to
represent unique-ification, this check wouldn't need to change. So I
plan to go this way, unless there are any objections.
Thanks
Richard
Hi Richard,
On Wed, Jul 30, 2025 at 9:08 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Jul 31, 2025 at 10:33 AM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:Thanks for the patch! I applied your patch and played around with it.
Thanks for trying out this patch.
I have a question about the following test case you added in
subselect.sql:I was under the impression that we wanted Unique on top of a sorted
node for the inner of the SIMI join. However, the expected output uses
a HashAgg instead. Is this expected?Hmm, I don't think we have such expectation that "Sort+Unique" should
be used for the unique-ification step in this query. This patch
considers both hash-based and sort-based implementations, and lets the
cost comparison determine which one is preferable. So I wouldn't be
surprised if the planner ends up choosing the hash-based
implementation in the final plan.While looking at the code, I also had a question about the following
changes in costsize.c:--- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -3963,7 +3963,9 @@ final_cost_mergejoin(PlannerInfo *root, MergePath*path,
* The whole issue is moot if we are working from a unique-ified outer * input, or if we know we don't need to mark/restore at all. */ - if (IsA(outer_path, UniquePath) || path->skip_mark_restore) + if (IsA(outer_path, UniquePath) || + IsA(outer_path, AggPath) || + path->skip_mark_restore)and
@@ -4358,7 +4360,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath
*path,
* because we avoid contaminating the cache with a value that's wrong
for
* non-unique-ified paths. */ - if (IsA(inner_path, UniquePath)) + if (IsA(inner_path, UniquePath) || IsA(inner_path, AggPath))I'm curious why AggPath was added in these two cases.
Well, in final_cost_hashjoin() and final_cost_mergejoin(), we have
some special cases when the inner or outer relation has been
unique-ified. Previously, it was sufficient to check whether the path
was a UniquePath, since both hash-based and sort-based implementations
were represented that way. However, with this patch, UniquePath now
only represents the sort-based implementation, so we also need to
check for AggPath to account for the hash-based case.The second diff looks fine to me. However, the first diff in the
subselect test happened to be the test that I asked about.Here's the comparison of the two EXPLAIN ANALYZE results:
While both runs are fast (28ms vs. 17ms), the plan generated by the v5
patch is slower in this case.This is a very interesting observation. In fact, with the original v5
patch, you can produce both plans by setting enable_hashagg on and
off.set enable_hashagg to on;
Incremental Sort (cost=91.95..277.59 rows=2500 width=16)
(actual time=31.960..147.040 rows=90000.00 loops=1)set enable_hashagg to off;
Merge Join (cost=70.14..294.34 rows=2500 width=16)
(actual time=4.303..71.891 rows=90000.00 loops=1)This seems to be another case where the planner chooses a suboptimal
plan due to inaccurate cost estimates.
Thanks for explaining! It makes sense now!
While reviewing the code, the following diff concerns me:
/*
* If the joinrel is parallel-safe, we may be able to consider a partial
- * merge join. However, we can't handle JOIN_UNIQUE_OUTER, because the
- * outer path will be partial, and therefore we won't be able to properly
- * guarantee uniqueness. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT
- * and JOIN_RIGHT_ANTI, because they can produce false null extended rows.
+ * merge join. However, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * JOIN_RIGHT_ANTI, because they can produce false null extended rows.
* Also, the resulting path must not be parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
What has changed that enabled JOIN_UNIQUE_OUTER to handle partial
paths? Or is it no longer possible for the outer path to be partial?
Best,
Alex
On Fri, Aug 1, 2025 at 11:58 PM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:
While reviewing the code, the following diff concerns me:
if (joinrel->consider_parallel && - save_jointype != JOIN_UNIQUE_OUTER && - save_jointype != JOIN_FULL && - save_jointype != JOIN_RIGHT && - save_jointype != JOIN_RIGHT_ANTI && + jointype != JOIN_FULL && + jointype != JOIN_RIGHT && + jointype != JOIN_RIGHT_ANTI && outerrel->partial_pathlist != NIL && bms_is_empty(joinrel->lateral_relids))What has changed that enabled JOIN_UNIQUE_OUTER to handle partial
paths? Or is it no longer possible for the outer path to be partial?
It's the latter, as indicated by the Assert in create_unique_paths():
+ /*
+ * There shouldn't be any partial paths for the unique relation;
+ * otherwise, we won't be able to properly guarantee uniqueness.
+ */
+ Assert(unique_rel->partial_pathlist == NIL);
Thanks
Richard
The v5 patch does not apply anymore, and here is a new rebase. There
are two main changes in v6:
* I choose to use the check I proposed earlier to determine whether a
relation has been unique-ified in costsize.c.
* Now that the only call to relation_has_unique_index_for() that
supplied an exprlist and oprlist has been removed, the loop handling
those lists is effectively dead code. 0002 removes that loop and
simplifies the function accordingly.
Thanks
Richard
Attachments:
v6-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchapplication/octet-stream; name=v6-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchDownload
From 2d7a1c8667c4fc58be0c67ca8a1a76fbfb9c9351 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 21 May 2025 12:32:29 +0900
Subject: [PATCH v6 1/2] Pathify RHS unique-ification for semijoin planning
There are two implementation techniques for semijoins: one uses the
JOIN_SEMI jointype, where the executor emits at most one matching row
per left-hand side (LHS) row; the other unique-ifies the right-hand
side (RHS) and then performs a plain inner join.
The latter technique currently has some drawbacks related to the
unique-ification step.
* Only the cheapest-total path of the RHS is considered during
unique-ification. This may cause us to miss some optimization
opportunities; for example, a path with a better sort order might be
overlooked simply because it is not the cheapest in total cost. Such
a path could help avoid a sort at a higher level, potentially
resulting in a cheaper overall plan.
* We currently rely on heuristics to choose between hash-based and
sort-based unique-ification. A better approach would be to generate
paths for both methods and allow add_path() to decide which one is
preferable, consistent with how path selection is handled elsewhere in
the planner.
* In the sort-based implementation, we currently pay no attention to
the pathkeys of the input subpath or the resulting output. This can
result in redundant sort nodes being added to the final plan.
This patch improves semijoin planning by creating a new RelOptInfo for
the RHS rel to represent its unique-ified version. It then generates
multiple paths that represent elimination of distinct rows from the
RHS, considering both a hash-based implementation using the cheapest
total path of the original RHS rel, and sort-based implementations
that either exploit presorted input paths or explicitly sort the
cheapest total path. All resulting paths compete in add_path(), and
those deemed worthy of consideration are added to the new RelOptInfo.
Finally, the unique-ified rel is joined with the other side of the
semijoin using a plain inner join.
As a side effect, most of the code related to the JOIN_UNIQUE_OUTER
and JOIN_UNIQUE_INNER jointypes -- used to indicate that the LHS or
RHS path should be made unique -- has been removed. Besides, the
T_Unique path now has the same meaning for both semijoins and upper
DISTINCT clauses: it represents adjacent-duplicate removal on
presorted input. This patch unifies their handling by sharing the
same data structures and functions.
This patch also removes the UNIQUE_PATH_NOOP related code along the
way, as it is dead code -- if the RHS rel is provably unique, the
semijoin should have already been simplified to a plain inner join by
analyzejoins.c.
---
src/backend/optimizer/README | 3 +-
src/backend/optimizer/path/costsize.c | 11 +-
src/backend/optimizer/path/joinpath.c | 338 ++++--------
src/backend/optimizer/path/joinrels.c | 18 +-
src/backend/optimizer/plan/createplan.c | 292 +----------
src/backend/optimizer/plan/planner.c | 518 ++++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 30 +-
src/backend/optimizer/util/pathnode.c | 306 +----------
src/backend/optimizer/util/relnode.c | 13 +-
src/include/nodes/nodes.h | 4 +-
src/include/nodes/pathnodes.h | 77 +--
src/include/optimizer/pathnode.h | 12 +-
src/include/optimizer/planner.h | 3 +
src/test/regress/expected/join.out | 15 +-
src/test/regress/expected/partition_join.out | 94 ++--
src/test/regress/expected/subselect.out | 233 ++++++++-
src/test/regress/sql/subselect.sql | 67 +++
src/tools/pgindent/typedefs.list | 2 -
18 files changed, 1065 insertions(+), 971 deletions(-)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..843368096fd 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -640,7 +640,6 @@ RelOptInfo - a relation or joined relations
GroupResultPath - childless Result plan node (used for degenerate grouping)
MaterialPath - a Material plan node
MemoizePath - a Memoize plan node for caching tuples from sub-paths
- UniquePath - remove duplicate rows (either by hashing or sorting)
GatherPath - collect the results of parallel workers
GatherMergePath - collect parallel results, preserving their common sort order
ProjectionPath - a Result plan node with child (used for projection)
@@ -648,7 +647,7 @@ RelOptInfo - a relation or joined relations
SortPath - a Sort plan node applied to some sub-path
IncrementalSortPath - an IncrementalSort plan node applied to some sub-path
GroupPath - a Group plan node applied to some sub-path
- UpperUniquePath - a Unique plan node applied to some sub-path
+ UniquePath - a Unique plan node applied to some sub-path
AggPath - an Agg plan node applied to some sub-path
GroupingSetsPath - an Agg plan node used to implement GROUPING SETS
MinMaxAggPath - a Result plan node with subplans performing MIN/MAX
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 344a3188317..0996f093dd6 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3966,10 +3966,12 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
* when we should not. Can we do better without expensive selectivity
* computations?
*
- * The whole issue is moot if we are working from a unique-ified outer
- * input, or if we know we don't need to mark/restore at all.
+ * The whole issue is moot if we know we don't need to mark/restore at
+ * all, or if we are working from a unique-ified outer input.
*/
- if (IsA(outer_path, UniquePath) || path->skip_mark_restore)
+ if (path->skip_mark_restore ||
+ IS_UNIQUEIFIED_REL(outer_path->parent, extra->sjinfo,
+ path->jpath.jointype))
rescannedtuples = 0;
else
{
@@ -4364,7 +4366,8 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
* because we avoid contaminating the cache with a value that's wrong for
* non-unique-ified paths.
*/
- if (IsA(inner_path, UniquePath))
+ if (IS_UNIQUEIFIED_REL(inner_path->parent, extra->sjinfo,
+ path->jpath.jointype))
{
innerbucketsize = 1.0 / virtualbuckets;
innermcvfreq = 0.0;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index ebedc5574ca..de06aa386de 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -112,12 +112,12 @@ static void generate_mergejoin_paths(PlannerInfo *root,
* "flipped around" if we are considering joining the rels in the opposite
* direction from what's indicated in sjinfo.
*
- * Also, this routine and others in this module accept the special JoinTypes
- * JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that we should
- * unique-ify the outer or inner relation and then apply a regular inner
- * join. These values are not allowed to propagate outside this module,
- * however. Path cost estimation code may need to recognize that it's
- * dealing with such a case --- the combination of nominal jointype INNER
+ * Also, this routine accepts the special JoinTypes JOIN_UNIQUE_OUTER and
+ * JOIN_UNIQUE_INNER to indicate that the outer or inner relation has been
+ * unique-ified and a regular inner join should then be applied. These values
+ * are not allowed to propagate outside this routine, however. Path cost
+ * estimation code, as well as match_unsorted_outer, may need to recognize that
+ * it's dealing with such a case --- the combination of nominal jointype INNER
* with sjinfo->jointype == JOIN_SEMI indicates that.
*/
void
@@ -129,6 +129,7 @@ add_paths_to_joinrel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
List *restrictlist)
{
+ JoinType save_jointype = jointype;
JoinPathExtraData extra;
bool mergejoin_allowed = true;
ListCell *lc;
@@ -165,10 +166,10 @@ add_paths_to_joinrel(PlannerInfo *root,
* reduce_unique_semijoins would've simplified it), so there's no point in
* calling innerrel_is_unique. However, if the LHS covers all of the
* semijoin's min_lefthand, then it's appropriate to set inner_unique
- * because the path produced by create_unique_path will be unique relative
- * to the LHS. (If we have an LHS that's only part of the min_lefthand,
- * that is *not* true.) For JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid
- * letting that value escape this module.
+ * because the unique relation produced by create_unique_paths will be
+ * unique relative to the LHS. (If we have an LHS that's only part of the
+ * min_lefthand, that is *not* true.) For JOIN_UNIQUE_OUTER, pass
+ * JOIN_INNER to avoid letting that value escape this module.
*/
switch (jointype)
{
@@ -199,6 +200,13 @@ add_paths_to_joinrel(PlannerInfo *root,
break;
}
+ /*
+ * If the outer or inner relation has been unique-ified, handle as a plain
+ * inner join.
+ */
+ if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
+ jointype = JOIN_INNER;
+
/*
* Find potential mergejoin clauses. We can skip this if we are not
* interested in doing a mergejoin. However, mergejoin may be our only
@@ -329,7 +337,7 @@ add_paths_to_joinrel(PlannerInfo *root,
joinrel->fdwroutine->GetForeignJoinPaths)
joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
/*
* 6. Finally, give extensions a chance to manipulate the path list. They
@@ -339,7 +347,7 @@ add_paths_to_joinrel(PlannerInfo *root,
*/
if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
}
/*
@@ -1364,7 +1372,6 @@ sort_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *outer_path;
Path *inner_path;
Path *cheapest_partial_outer = NULL;
@@ -1402,38 +1409,16 @@ sort_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(inner_path, outerrel))
return;
- /*
- * If unique-ification is requested, do it and then handle as a plain
- * inner join.
- */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- outer_path = (Path *) create_unique_path(root, outerrel,
- outer_path, extra->sjinfo);
- Assert(outer_path);
- jointype = JOIN_INNER;
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- inner_path = (Path *) create_unique_path(root, innerrel,
- inner_path, extra->sjinfo);
- Assert(inner_path);
- jointype = JOIN_INNER;
- }
-
/*
* If the joinrel is parallel-safe, we may be able to consider a partial
- * merge join. However, we can't handle JOIN_UNIQUE_OUTER, because the
- * outer path will be partial, and therefore we won't be able to properly
- * guarantee uniqueness. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT
- * and JOIN_RIGHT_ANTI, because they can produce false null extended rows.
+ * merge join. However, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * JOIN_RIGHT_ANTI, because they can produce false null extended rows.
* Also, the resulting path must not be parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -1441,7 +1426,7 @@ sort_inner_and_outer(PlannerInfo *root,
if (inner_path->parallel_safe)
cheapest_safe_inner = inner_path;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
@@ -1580,13 +1565,9 @@ generate_mergejoin_paths(PlannerInfo *root,
List *trialsortkeys;
Path *cheapest_startup_inner;
Path *cheapest_total_inner;
- JoinType save_jointype = jointype;
int num_sortkeys;
int sortkeycnt;
- if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/* Look for useful mergeclauses (if any) */
mergeclauses =
find_mergeclauses_for_outer_pathkeys(root,
@@ -1636,10 +1617,6 @@ generate_mergejoin_paths(PlannerInfo *root,
extra,
is_partial);
- /* Can't do anything else if inner path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
/*
* Look for presorted inner paths that satisfy the innersortkey list ---
* or any truncation thereof, if we are allowed to build a mergejoin using
@@ -1819,7 +1796,6 @@ match_unsorted_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool nestjoinOK;
bool useallclauses;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
@@ -1855,12 +1831,6 @@ match_unsorted_outer(PlannerInfo *root,
nestjoinOK = false;
useallclauses = true;
break;
- case JOIN_UNIQUE_OUTER:
- case JOIN_UNIQUE_INNER:
- jointype = JOIN_INNER;
- nestjoinOK = true;
- useallclauses = false;
- break;
default:
elog(ERROR, "unrecognized join type: %d",
(int) jointype);
@@ -1873,24 +1843,20 @@ match_unsorted_outer(PlannerInfo *root,
* If inner_cheapest_total is parameterized by the outer rel, ignore it;
* we will consider it below as a member of cheapest_parameterized_paths,
* but the other possibilities considered in this routine aren't usable.
+ *
+ * Furthermore, if the inner side is a unique-ified relation, we cannot
+ * generate any valid paths here, because the inner rel's dependency on
+ * the outer rel makes unique-ification meaningless.
*/
if (PATH_PARAM_BY_REL(inner_cheapest_total, outerrel))
+ {
inner_cheapest_total = NULL;
- /*
- * If we need to unique-ify the inner path, we will consider only the
- * cheapest-total inner.
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /* No way to do this with an inner path parameterized by outer rel */
- if (inner_cheapest_total == NULL)
+ if (IS_UNIQUEIFIED_REL(innerrel, extra->sjinfo, jointype))
return;
- inner_cheapest_total = (Path *)
- create_unique_path(root, innerrel, inner_cheapest_total, extra->sjinfo);
- Assert(inner_cheapest_total);
}
- else if (nestjoinOK)
+
+ if (nestjoinOK)
{
/*
* Consider materializing the cheapest inner path, unless
@@ -1914,20 +1880,6 @@ match_unsorted_outer(PlannerInfo *root,
if (PATH_PARAM_BY_REL(outerpath, innerrel))
continue;
- /*
- * If we need to unique-ify the outer path, it's pointless to consider
- * any but the cheapest outer. (XXX we don't consider parameterized
- * outers, nor inners, for unique-ified cases. Should we?)
- */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- {
- if (outerpath != outerrel->cheapest_total_path)
- continue;
- outerpath = (Path *) create_unique_path(root, outerrel,
- outerpath, extra->sjinfo);
- Assert(outerpath);
- }
-
/*
* The result will have this sort order (even if it is implemented as
* a nestloop, and even if some of the mergeclauses are implemented by
@@ -1936,21 +1888,7 @@ match_unsorted_outer(PlannerInfo *root,
merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
outerpath->pathkeys);
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /*
- * Consider nestloop join, but only with the unique-ified cheapest
- * inner path
- */
- try_nestloop_path(root,
- joinrel,
- outerpath,
- inner_cheapest_total,
- merge_pathkeys,
- jointype,
- extra);
- }
- else if (nestjoinOK)
+ if (nestjoinOK)
{
/*
* Consider nestloop joins using this outer path and various
@@ -2001,17 +1939,13 @@ match_unsorted_outer(PlannerInfo *root,
extra);
}
- /* Can't do anything else if outer path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- continue;
-
/* Can't do anything else if inner rel is parameterized by outer */
if (inner_cheapest_total == NULL)
continue;
/* Generate merge join paths */
generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
- save_jointype, extra, useallclauses,
+ jointype, extra, useallclauses,
inner_cheapest_total, merge_pathkeys,
false);
}
@@ -2019,41 +1953,35 @@ match_unsorted_outer(PlannerInfo *root,
/*
* Consider partial nestloop and mergejoin plan if outerrel has any
* partial path and the joinrel is parallel-safe. However, we can't
- * handle JOIN_UNIQUE_OUTER, because the outer path will be partial, and
- * therefore we won't be able to properly guarantee uniqueness. Nor can
- * we handle joins needing lateral rels, since partial paths must not be
- * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * handle joins needing lateral rels, since partial paths must not be
+ * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
* JOIN_RIGHT_ANTI, because they can produce false null extended rows.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
if (nestjoinOK)
consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
- save_jointype, extra);
+ jointype, extra);
/*
* If inner_cheapest_total is NULL or non parallel-safe then find the
- * cheapest total parallel safe path. If doing JOIN_UNIQUE_INNER, we
- * can't use any alternative inner path.
+ * cheapest total parallel safe path.
*/
if (inner_cheapest_total == NULL ||
!inner_cheapest_total->parallel_safe)
{
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
- inner_cheapest_total = get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
+ inner_cheapest_total =
+ get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
if (inner_cheapest_total)
consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
- save_jointype, extra,
+ jointype, extra,
inner_cheapest_total);
}
}
@@ -2118,24 +2046,17 @@ consider_parallel_nestloop(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
Path *matpath = NULL;
ListCell *lc1;
- if (jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/*
- * Consider materializing the cheapest inner path, unless: 1) we're doing
- * JOIN_UNIQUE_INNER, because in this case we have to unique-ify the
- * cheapest inner path, 2) enable_material is off, 3) the cheapest inner
- * path is not parallel-safe, 4) the cheapest inner path is parameterized
- * by the outer rel, or 5) the cheapest inner path materializes its output
- * anyway.
+ * Consider materializing the cheapest inner path, unless: 1)
+ * enable_material is off, 2) the cheapest inner path is not
+ * parallel-safe, 3) the cheapest inner path is parameterized by the outer
+ * rel, or 4) the cheapest inner path materializes its output anyway.
*/
- if (save_jointype != JOIN_UNIQUE_INNER &&
- enable_material && inner_cheapest_total->parallel_safe &&
+ if (enable_material && inner_cheapest_total->parallel_safe &&
!PATH_PARAM_BY_REL(inner_cheapest_total, outerrel) &&
!ExecMaterializesOutput(inner_cheapest_total->pathtype))
{
@@ -2169,23 +2090,6 @@ consider_parallel_nestloop(PlannerInfo *root,
if (!innerpath->parallel_safe)
continue;
- /*
- * If we're doing JOIN_UNIQUE_INNER, we can only use the inner's
- * cheapest_total_path, and we have to unique-ify it. (We might
- * be able to relax this to allow other safe, unparameterized
- * inner paths, but right now create_unique_path is not on board
- * with that.)
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- if (innerpath != innerrel->cheapest_total_path)
- continue;
- innerpath = (Path *) create_unique_path(root, innerrel,
- innerpath,
- extra->sjinfo);
- Assert(innerpath);
- }
-
try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
pathkeys, jointype, extra);
@@ -2227,7 +2131,6 @@ hash_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool isouterjoin = IS_OUTER_JOIN(jointype);
List *hashclauses;
ListCell *l;
@@ -2290,6 +2193,8 @@ hash_inner_and_outer(PlannerInfo *root,
Path *cheapest_startup_outer = outerrel->cheapest_startup_path;
Path *cheapest_total_outer = outerrel->cheapest_total_path;
Path *cheapest_total_inner = innerrel->cheapest_total_path;
+ ListCell *lc1;
+ ListCell *lc2;
/*
* If either cheapest-total path is parameterized by the other rel, we
@@ -2301,114 +2206,64 @@ hash_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
return;
- /* Unique-ify if need be; we ignore parameterized possibilities */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- cheapest_total_outer = (Path *)
- create_unique_path(root, outerrel,
- cheapest_total_outer, extra->sjinfo);
- Assert(cheapest_total_outer);
- jointype = JOIN_INNER;
- try_hashjoin_path(root,
- joinrel,
- cheapest_total_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- /* no possibility of cheap startup here */
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- cheapest_total_inner = (Path *)
- create_unique_path(root, innerrel,
- cheapest_total_inner, extra->sjinfo);
- Assert(cheapest_total_inner);
- jointype = JOIN_INNER;
+ /*
+ * Consider the cheapest startup outer together with the cheapest
+ * total inner, and then consider pairings of cheapest-total paths
+ * including parameterized ones. There is no use in generating
+ * parameterized paths on the basis of possibly cheap startup cost, so
+ * this is sufficient.
+ */
+ if (cheapest_startup_outer != NULL)
try_hashjoin_path(root,
joinrel,
- cheapest_total_outer,
+ cheapest_startup_outer,
cheapest_total_inner,
hashclauses,
jointype,
extra);
- if (cheapest_startup_outer != NULL &&
- cheapest_startup_outer != cheapest_total_outer)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- }
- else
+
+ foreach(lc1, outerrel->cheapest_parameterized_paths)
{
+ Path *outerpath = (Path *) lfirst(lc1);
+
/*
- * For other jointypes, we consider the cheapest startup outer
- * together with the cheapest total inner, and then consider
- * pairings of cheapest-total paths including parameterized ones.
- * There is no use in generating parameterized paths on the basis
- * of possibly cheap startup cost, so this is sufficient.
+ * We cannot use an outer path that is parameterized by the inner
+ * rel.
*/
- ListCell *lc1;
- ListCell *lc2;
-
- if (cheapest_startup_outer != NULL)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
+ if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ continue;
- foreach(lc1, outerrel->cheapest_parameterized_paths)
+ foreach(lc2, innerrel->cheapest_parameterized_paths)
{
- Path *outerpath = (Path *) lfirst(lc1);
+ Path *innerpath = (Path *) lfirst(lc2);
/*
- * We cannot use an outer path that is parameterized by the
- * inner rel.
+ * We cannot use an inner path that is parameterized by the
+ * outer rel, either.
*/
- if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ if (PATH_PARAM_BY_REL(innerpath, outerrel))
continue;
- foreach(lc2, innerrel->cheapest_parameterized_paths)
- {
- Path *innerpath = (Path *) lfirst(lc2);
-
- /*
- * We cannot use an inner path that is parameterized by
- * the outer rel, either.
- */
- if (PATH_PARAM_BY_REL(innerpath, outerrel))
- continue;
+ if (outerpath == cheapest_startup_outer &&
+ innerpath == cheapest_total_inner)
+ continue; /* already tried it */
- if (outerpath == cheapest_startup_outer &&
- innerpath == cheapest_total_inner)
- continue; /* already tried it */
-
- try_hashjoin_path(root,
- joinrel,
- outerpath,
- innerpath,
- hashclauses,
- jointype,
- extra);
- }
+ try_hashjoin_path(root,
+ joinrel,
+ outerpath,
+ innerpath,
+ hashclauses,
+ jointype,
+ extra);
}
}
/*
* If the joinrel is parallel-safe, we may be able to consider a
- * partial hash join. However, we can't handle JOIN_UNIQUE_OUTER,
- * because the outer path will be partial, and therefore we won't be
- * able to properly guarantee uniqueness. Also, the resulting path
- * must not be parameterized.
+ * partial hash join. However, the resulting path must not be
+ * parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -2421,11 +2276,9 @@ hash_inner_and_outer(PlannerInfo *root,
/*
* Can we use a partial inner plan too, so that we can build a
- * shared hash table in parallel? We can't handle
- * JOIN_UNIQUE_INNER because we can't guarantee uniqueness.
+ * shared hash table in parallel?
*/
if (innerrel->partial_pathlist != NIL &&
- save_jointype != JOIN_UNIQUE_INNER &&
enable_parallel_hash)
{
cheapest_partial_inner =
@@ -2441,19 +2294,18 @@ hash_inner_and_outer(PlannerInfo *root,
* Normally, given that the joinrel is parallel-safe, the cheapest
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
- * path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, right-semi or right-anti join, we
- * can't use parallelism (building the hash table in each backend)
+ * path. If full, right, right-semi or right-anti join, we can't
+ * use parallelism (building the hash table in each backend)
* because no one process has all the match bits.
*/
- if (save_jointype == JOIN_FULL ||
- save_jointype == JOIN_RIGHT ||
- save_jointype == JOIN_RIGHT_SEMI ||
- save_jointype == JOIN_RIGHT_ANTI)
+ if (jointype == JOIN_FULL ||
+ jointype == JOIN_RIGHT ||
+ jointype == JOIN_RIGHT_SEMI ||
+ jointype == JOIN_RIGHT_ANTI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index aad41b94009..535248aa525 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -19,6 +19,7 @@
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
+#include "optimizer/planner.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
@@ -444,8 +445,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel2, sjinfo) != NULL)
{
/*----------
* For a semijoin, we can join the RHS to anything else by
@@ -477,8 +477,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel1->relids) &&
- create_unique_path(root, rel1, rel1->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel1, sjinfo) != NULL)
{
/* Reversed semijoin case */
if (match_sjinfo)
@@ -886,6 +885,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist)
{
+ RelOptInfo *unique_rel2;
+
/*
* Consider paths using each rel as both outer and inner. Depending on
* the join type, a provably empty outer or inner rel might mean the join
@@ -991,14 +992,13 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
/*
* If we know how to unique-ify the RHS and one input rel is
* exactly the RHS (not a superset) we can consider unique-ifying
- * it and then doing a regular join. (The create_unique_path
+ * it and then doing a regular join. (The create_unique_paths
* check here is probably redundant with what join_is_legal did,
* but if so the check is cheap because it's cached. So test
* anyway to be sure.)
*/
if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ (unique_rel2 = create_unique_paths(root, rel2, sjinfo)) != NULL)
{
if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
restriction_is_constant_false(restrictlist, joinrel, false))
@@ -1006,10 +1006,10 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
mark_dummy_rel(joinrel);
break;
}
- add_paths_to_joinrel(root, joinrel, rel1, rel2,
+ add_paths_to_joinrel(root, joinrel, rel1, unique_rel2,
JOIN_UNIQUE_INNER, sjinfo,
restrictlist);
- add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ add_paths_to_joinrel(root, joinrel, unique_rel2, rel1,
JOIN_UNIQUE_OUTER, sjinfo,
restrictlist);
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bfefc7dbea1..1c3381b02b8 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -95,8 +95,6 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
int flags);
static Memoize *create_memoize_plan(PlannerInfo *root, MemoizePath *best_path,
int flags);
-static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
- int flags);
static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
static Plan *create_projection_plan(PlannerInfo *root,
ProjectionPath *best_path,
@@ -106,8 +104,7 @@ static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
static IncrementalSort *create_incrementalsort_plan(PlannerInfo *root,
IncrementalSortPath *best_path, int flags);
static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
-static Unique *create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path,
- int flags);
+static Unique *create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags);
static Agg *create_agg_plan(PlannerInfo *root, AggPath *best_path);
static Plan *create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path);
static Result *create_minmaxagg_plan(PlannerInfo *root, MinMaxAggPath *best_path);
@@ -296,9 +293,9 @@ static WindowAgg *make_windowagg(List *tlist, WindowClause *wc,
static Group *make_group(List *tlist, List *qual, int numGroupCols,
AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations,
Plan *lefttree);
-static Unique *make_unique_from_sortclauses(Plan *lefttree, List *distinctList);
static Unique *make_unique_from_pathkeys(Plan *lefttree,
- List *pathkeys, int numCols);
+ List *pathkeys, int numCols,
+ Relids relids);
static Gather *make_gather(List *qptlist, List *qpqual,
int nworkers, int rescan_param, bool single_copy, Plan *subplan);
static SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy,
@@ -470,19 +467,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
flags);
break;
case T_Unique:
- if (IsA(best_path, UpperUniquePath))
- {
- plan = (Plan *) create_upper_unique_plan(root,
- (UpperUniquePath *) best_path,
- flags);
- }
- else
- {
- Assert(IsA(best_path, UniquePath));
- plan = create_unique_plan(root,
- (UniquePath *) best_path,
- flags);
- }
+ plan = (Plan *) create_unique_plan(root,
+ (UniquePath *) best_path,
+ flags);
break;
case T_Gather:
plan = (Plan *) create_gather_plan(root,
@@ -1764,207 +1751,6 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags)
return plan;
}
-/*
- * create_unique_plan
- * Create a Unique plan for 'best_path' and (recursively) plans
- * for its subpaths.
- *
- * Returns a Plan node.
- */
-static Plan *
-create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
-{
- Plan *plan;
- Plan *subplan;
- List *in_operators;
- List *uniq_exprs;
- List *newtlist;
- int nextresno;
- bool newitems;
- int numGroupCols;
- AttrNumber *groupColIdx;
- Oid *groupCollations;
- int groupColPos;
- ListCell *l;
-
- /* Unique doesn't project, so tlist requirements pass through */
- subplan = create_plan_recurse(root, best_path->subpath, flags);
-
- /* Done if we don't need to do any actual unique-ifying */
- if (best_path->umethod == UNIQUE_PATH_NOOP)
- return subplan;
-
- /*
- * As constructed, the subplan has a "flat" tlist containing just the Vars
- * needed here and at upper levels. The values we are supposed to
- * unique-ify may be expressions in these variables. We have to add any
- * such expressions to the subplan's tlist.
- *
- * The subplan may have a "physical" tlist if it is a simple scan plan. If
- * we're going to sort, this should be reduced to the regular tlist, so
- * that we don't sort more data than we need to. For hashing, the tlist
- * should be left as-is if we don't need to add any expressions; but if we
- * do have to add expressions, then a projection step will be needed at
- * runtime anyway, so we may as well remove unneeded items. Therefore
- * newtlist starts from build_path_tlist() not just a copy of the
- * subplan's tlist; and we don't install it into the subplan unless we are
- * sorting or stuff has to be added.
- */
- in_operators = best_path->in_operators;
- uniq_exprs = best_path->uniq_exprs;
-
- /* initialize modified subplan tlist as just the "required" vars */
- newtlist = build_path_tlist(root, &best_path->path);
- nextresno = list_length(newtlist) + 1;
- newitems = false;
-
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle)
- {
- tle = makeTargetEntry((Expr *) uniqexpr,
- nextresno,
- NULL,
- false);
- newtlist = lappend(newtlist, tle);
- nextresno++;
- newitems = true;
- }
- }
-
- /* Use change_plan_targetlist in case we need to insert a Result node */
- if (newitems || best_path->umethod == UNIQUE_PATH_SORT)
- subplan = change_plan_targetlist(subplan, newtlist,
- best_path->path.parallel_safe);
-
- /*
- * Build control information showing which subplan output columns are to
- * be examined by the grouping step. Unfortunately we can't merge this
- * with the previous loop, since we didn't then know which version of the
- * subplan tlist we'd end up using.
- */
- newtlist = subplan->targetlist;
- numGroupCols = list_length(uniq_exprs);
- groupColIdx = (AttrNumber *) palloc(numGroupCols * sizeof(AttrNumber));
- groupCollations = (Oid *) palloc(numGroupCols * sizeof(Oid));
-
- groupColPos = 0;
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle) /* shouldn't happen */
- elog(ERROR, "failed to find unique expression in subplan tlist");
- groupColIdx[groupColPos] = tle->resno;
- groupCollations[groupColPos] = exprCollation((Node *) tle->expr);
- groupColPos++;
- }
-
- if (best_path->umethod == UNIQUE_PATH_HASH)
- {
- Oid *groupOperators;
-
- /*
- * Get the hashable equality operators for the Agg node to use.
- * Normally these are the same as the IN clause operators, but if
- * those are cross-type operators then the equality operators are the
- * ones for the IN clause operators' RHS datatype.
- */
- groupOperators = (Oid *) palloc(numGroupCols * sizeof(Oid));
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid eq_oper;
-
- if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
- elog(ERROR, "could not find compatible hash operator for operator %u",
- in_oper);
- groupOperators[groupColPos++] = eq_oper;
- }
-
- /*
- * Since the Agg node is going to project anyway, we can give it the
- * minimum output tlist, without any stuff we might have added to the
- * subplan tlist.
- */
- plan = (Plan *) make_agg(build_path_tlist(root, &best_path->path),
- NIL,
- AGG_HASHED,
- AGGSPLIT_SIMPLE,
- numGroupCols,
- groupColIdx,
- groupOperators,
- groupCollations,
- NIL,
- NIL,
- best_path->path.rows,
- 0,
- subplan);
- }
- else
- {
- List *sortList = NIL;
- Sort *sort;
-
- /* Create an ORDER BY list to sort the input compatibly */
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid sortop;
- Oid eqop;
- TargetEntry *tle;
- SortGroupClause *sortcl;
-
- sortop = get_ordering_op_for_equality_op(in_oper, false);
- if (!OidIsValid(sortop)) /* shouldn't happen */
- elog(ERROR, "could not find ordering operator for equality operator %u",
- in_oper);
-
- /*
- * The Unique node will need equality operators. Normally these
- * are the same as the IN clause operators, but if those are
- * cross-type operators then the equality operators are the ones
- * for the IN clause operators' RHS datatype.
- */
- eqop = get_equality_op_for_ordering_op(sortop, NULL);
- if (!OidIsValid(eqop)) /* shouldn't happen */
- elog(ERROR, "could not find equality operator for ordering operator %u",
- sortop);
-
- tle = get_tle_by_resno(subplan->targetlist,
- groupColIdx[groupColPos]);
- Assert(tle != NULL);
-
- sortcl = makeNode(SortGroupClause);
- sortcl->tleSortGroupRef = assignSortGroupRef(tle,
- subplan->targetlist);
- sortcl->eqop = eqop;
- sortcl->sortop = sortop;
- sortcl->reverse_sort = false;
- sortcl->nulls_first = false;
- sortcl->hashable = false; /* no need to make this accurate */
- sortList = lappend(sortList, sortcl);
- groupColPos++;
- }
- sort = make_sort_from_sortclauses(sortList, subplan);
- label_sort_with_costsize(root, sort, -1.0);
- plan = (Plan *) make_unique_from_sortclauses((Plan *) sort, sortList);
- }
-
- /* Copy cost data from Path to Plan */
- copy_generic_path_info(plan, &best_path->path);
-
- return plan;
-}
-
/*
* create_gather_plan
*
@@ -2322,13 +2108,13 @@ create_group_plan(PlannerInfo *root, GroupPath *best_path)
}
/*
- * create_upper_unique_plan
+ * create_unique_plan
*
* Create a Unique plan for 'best_path' and (recursively) plans
* for its subpaths.
*/
static Unique *
-create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flags)
+create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
{
Unique *plan;
Plan *subplan;
@@ -2342,7 +2128,8 @@ create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flag
plan = make_unique_from_pathkeys(subplan,
best_path->path.pathkeys,
- best_path->numkeys);
+ best_path->numkeys,
+ best_path->path.parent->relids);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -6880,61 +6667,12 @@ make_group(List *tlist,
}
/*
- * distinctList is a list of SortGroupClauses, identifying the targetlist items
- * that should be considered by the Unique filter. The input path must
- * already be sorted accordingly.
- */
-static Unique *
-make_unique_from_sortclauses(Plan *lefttree, List *distinctList)
-{
- Unique *node = makeNode(Unique);
- Plan *plan = &node->plan;
- int numCols = list_length(distinctList);
- int keyno = 0;
- AttrNumber *uniqColIdx;
- Oid *uniqOperators;
- Oid *uniqCollations;
- ListCell *slitem;
-
- plan->targetlist = lefttree->targetlist;
- plan->qual = NIL;
- plan->lefttree = lefttree;
- plan->righttree = NULL;
-
- /*
- * convert SortGroupClause list into arrays of attr indexes and equality
- * operators, as wanted by executor
- */
- Assert(numCols > 0);
- uniqColIdx = (AttrNumber *) palloc(sizeof(AttrNumber) * numCols);
- uniqOperators = (Oid *) palloc(sizeof(Oid) * numCols);
- uniqCollations = (Oid *) palloc(sizeof(Oid) * numCols);
-
- foreach(slitem, distinctList)
- {
- SortGroupClause *sortcl = (SortGroupClause *) lfirst(slitem);
- TargetEntry *tle = get_sortgroupclause_tle(sortcl, plan->targetlist);
-
- uniqColIdx[keyno] = tle->resno;
- uniqOperators[keyno] = sortcl->eqop;
- uniqCollations[keyno] = exprCollation((Node *) tle->expr);
- Assert(OidIsValid(uniqOperators[keyno]));
- keyno++;
- }
-
- node->numCols = numCols;
- node->uniqColIdx = uniqColIdx;
- node->uniqOperators = uniqOperators;
- node->uniqCollations = uniqCollations;
-
- return node;
-}
-
-/*
- * as above, but use pathkeys to identify the sort columns and semantics
+ * pathkeys is a list of PathKeys, identifying the sort columns and semantics.
+ * The input path must already be sorted accordingly.
*/
static Unique *
-make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
+make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols,
+ Relids relids)
{
Unique *node = makeNode(Unique);
Plan *plan = &node->plan;
@@ -6997,7 +6735,7 @@ make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
foreach(j, plan->targetlist)
{
tle = (TargetEntry *) lfirst(j);
- em = find_ec_member_matching_expr(ec, tle->expr, NULL);
+ em = find_ec_member_matching_expr(ec, tle->expr, relids);
if (em)
{
/* found expr already in tlist */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d59d6e4c6a0..a90adc790ae 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -267,6 +267,12 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
+static void create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
+static void create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
/*****************************************************************************
@@ -4906,10 +4912,10 @@ create_partial_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_partial_path(partial_distinct_rel, (Path *)
- create_upper_unique_path(root, partial_distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, partial_distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -5100,10 +5106,10 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_path(distinct_rel, (Path *)
- create_upper_unique_path(root, distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -8237,3 +8243,499 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
return grouplist;
}
+
+/*
+ * create_unique_paths
+ * Build a new RelOptInfo containing Paths that represent elimination of
+ * distinct rows from the input data. Distinct-ness is defined according to
+ * the needs of the semijoin represented by sjinfo. If it is not possible
+ * to identify how to make the data unique, NULL is returned.
+ *
+ * If used at all, this is likely to be called repeatedly on the same rel;
+ * So we cache the result.
+ */
+RelOptInfo *
+create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *unique_rel;
+ List *sortPathkeys = NIL;
+ List *groupClause = NIL;
+ MemoryContext oldcontext;
+
+ /* Caller made a mistake if SpecialJoinInfo is the wrong one */
+ Assert(sjinfo->jointype == JOIN_SEMI);
+ Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
+
+ /* If result already cached, return it */
+ if (rel->unique_rel)
+ return rel->unique_rel;
+
+ /* If it's not possible to unique-ify, return NULL */
+ if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
+ return NULL;
+
+ /*
+ * When called during GEQO join planning, we are in a short-lived memory
+ * context. We must make sure that the unique rel and any subsidiary data
+ * structures created for a baserel survive the GEQO cycle, else the
+ * baserel is trashed for future GEQO cycles. On the other hand, when we
+ * are creating those for a joinrel during GEQO, we don't want them to
+ * clutter the main planning context. Upshot is that the best solution is
+ * to explicitly allocate memory in the same context the given RelOptInfo
+ * is in.
+ */
+ oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
+
+ unique_rel = makeNode(RelOptInfo);
+ memcpy(unique_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ unique_rel->pathlist = NIL;
+ unique_rel->ppilist = NIL;
+ unique_rel->partial_pathlist = NIL;
+ unique_rel->cheapest_startup_path = NULL;
+ unique_rel->cheapest_total_path = NULL;
+ unique_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * Build the target list for the unique rel. We also build the pathkeys
+ * that represent the ordering requirements for the sort-based
+ * implementation, and the list of SortGroupClause nodes that represent
+ * the columns to be grouped on for the hash-based implementation.
+ *
+ * For a child rel, we can construct these fields from those of its
+ * parent.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ PathTarget *child_unique_target;
+ PathTarget *parent_unique_target;
+
+ parent_unique_target = rel->top_parent->unique_rel->reltarget;
+
+ child_unique_target = copy_pathtarget(parent_unique_target);
+
+ /* Translate the target expressions */
+ child_unique_target->exprs = (List *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) parent_unique_target->exprs,
+ rel,
+ rel->top_parent);
+
+ unique_rel->reltarget = child_unique_target;
+
+ sortPathkeys = rel->top_parent->unique_pathkeys;
+ groupClause = rel->top_parent->unique_groupclause;
+ }
+ else
+ {
+ List *newtlist;
+ int nextresno;
+ List *sortList = NIL;
+ ListCell *lc1;
+ ListCell *lc2;
+
+ /*
+ * The values we are supposed to unique-ify may be expressions in the
+ * variables of the input rel's targetlist. We have to add any such
+ * expressions to the unique rel's targetlist.
+ *
+ * While in the loop, build the lists of SortGroupClause's that
+ * represent the ordering for the sort-based implementation and the
+ * grouping for the hash-based implementation.
+ */
+ newtlist = make_tlist_from_pathtarget(rel->reltarget);
+ nextresno = list_length(newtlist) + 1;
+
+ forboth(lc1, sjinfo->semi_rhs_exprs, lc2, sjinfo->semi_operators)
+ {
+ Expr *uniqexpr = lfirst(lc1);
+ Oid in_oper = lfirst_oid(lc2);
+ Oid sortop = InvalidOid;
+ TargetEntry *tle;
+
+ tle = tlist_member(uniqexpr, newtlist);
+ if (!tle)
+ {
+ tle = makeTargetEntry((Expr *) uniqexpr,
+ nextresno,
+ NULL,
+ false);
+ newtlist = lappend(newtlist, tle);
+ nextresno++;
+ }
+
+ if (sjinfo->semi_can_btree)
+ {
+ /* Create an ORDER BY list to sort the input compatibly */
+ Oid eqop;
+ SortGroupClause *sortcl;
+
+ sortop = get_ordering_op_for_equality_op(in_oper, false);
+ if (!OidIsValid(sortop)) /* shouldn't happen */
+ elog(ERROR, "could not find ordering operator for equality operator %u",
+ in_oper);
+
+ /*
+ * The Unique node will need equality operators. Normally
+ * these are the same as the IN clause operators, but if those
+ * are cross-type operators then the equality operators are
+ * the ones for the IN clause operators' RHS datatype.
+ */
+ eqop = get_equality_op_for_ordering_op(sortop, NULL);
+ if (!OidIsValid(eqop)) /* shouldn't happen */
+ elog(ERROR, "could not find equality operator for ordering operator %u",
+ sortop);
+
+ sortcl = makeNode(SortGroupClause);
+ sortcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ sortcl->eqop = eqop;
+ sortcl->sortop = sortop;
+ sortcl->reverse_sort = false;
+ sortcl->nulls_first = false;
+ sortcl->hashable = false; /* no need to make this accurate */
+ sortList = lappend(sortList, sortcl);
+ }
+ if (sjinfo->semi_can_hash)
+ {
+ /* Create a GROUP BY list for the Agg node to use */
+ Oid eq_oper;
+ SortGroupClause *groupcl;
+
+ /*
+ * Get the hashable equality operators for the Agg node to
+ * use. Normally these are the same as the IN clause
+ * operators, but if those are cross-type operators then the
+ * equality operators are the ones for the IN clause
+ * operators' RHS datatype.
+ */
+ if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
+ elog(ERROR, "could not find compatible hash operator for operator %u",
+ in_oper);
+
+ groupcl = makeNode(SortGroupClause);
+ groupcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ groupcl->eqop = eq_oper;
+ groupcl->sortop = sortop;
+ groupcl->reverse_sort = false;
+ groupcl->nulls_first = false;
+ groupcl->hashable = true;
+ groupClause = lappend(groupClause, groupcl);
+ }
+ }
+
+ unique_rel->reltarget = create_pathtarget(root, newtlist);
+ sortPathkeys = make_pathkeys_for_sortclauses(root, sortList, newtlist);
+ }
+
+ /* build unique paths based on input rel's pathlist */
+ create_final_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* build unique paths based on input rel's partial_pathlist */
+ create_partial_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* Now choose the best path(s) */
+ set_cheapest(unique_rel);
+
+ /*
+ * There shouldn't be any partial paths for the unique relation;
+ * otherwise, we won't be able to properly guarantee uniqueness.
+ */
+ Assert(unique_rel->partial_pathlist == NIL);
+
+ /* Cache the result */
+ rel->unique_rel = unique_rel;
+ rel->unique_pathkeys = sortPathkeys;
+ rel->unique_groupclause = groupClause;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return unique_rel;
+}
+
+/*
+ * create_final_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' pathlist
+ */
+static void
+create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ Path *cheapest_input_path = input_rel->cheapest_total_path;
+
+ /* Estimate number of output rows */
+ unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ cheapest_input_path->rows,
+ NULL,
+ NULL);
+
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, input_rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_input_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_input_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ input_path,
+ unique_rel->reltarget);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, unique_rel, path,
+ list_length(sortPathkeys),
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ cheapest_input_path,
+ unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ unique_rel,
+ path,
+ cheapest_input_path->pathtarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+
+ }
+}
+
+/*
+ * create_partial_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' partial_pathlist
+ */
+static void
+create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ RelOptInfo *partial_unique_rel;
+ Path *cheapest_partial_path;
+
+ /* nothing to do when there are no partial paths in the input rel */
+ if (!input_rel->consider_parallel || input_rel->partial_pathlist == NIL)
+ return;
+
+ /*
+ * nothing to do if there's anything in the targetlist that's
+ * parallel-restricted.
+ */
+ if (!is_parallel_safe(root, (Node *) unique_rel->reltarget->exprs))
+ return;
+
+ cheapest_partial_path = linitial(input_rel->partial_pathlist);
+
+ partial_unique_rel = makeNode(RelOptInfo);
+ memcpy(partial_unique_rel, input_rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ partial_unique_rel->pathlist = NIL;
+ partial_unique_rel->ppilist = NIL;
+ partial_unique_rel->partial_pathlist = NIL;
+ partial_unique_rel->cheapest_startup_path = NULL;
+ partial_unique_rel->cheapest_total_path = NULL;
+ partial_unique_rel->cheapest_parameterized_paths = NIL;
+
+ /* Estimate number of output rows */
+ partial_unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ cheapest_partial_path->rows,
+ NULL,
+ NULL);
+ partial_unique_rel->reltarget = unique_rel->reltarget;
+
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest partial path and incremental sort on any paths
+ * with presorted keys.
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ input_path,
+ partial_unique_rel->reltarget);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, partial_unique_rel, path,
+ list_length(sortPathkeys),
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ cheapest_partial_path,
+ partial_unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ partial_unique_rel,
+ path,
+ cheapest_partial_path->pathtarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+
+ if (partial_unique_rel->partial_pathlist != NIL)
+ {
+ generate_useful_gather_paths(root, partial_unique_rel, true);
+ set_cheapest(partial_unique_rel);
+
+ /*
+ * Finally, create paths to unique-ify the final result. This step is
+ * needed to remove any duplicates due to combining rows from parallel
+ * workers.
+ */
+ create_final_unique_paths(root, partial_unique_rel,
+ sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+ }
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index eab44da65b8..28a4ae64440 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -929,11 +929,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
@@ -946,11 +946,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
}
}
@@ -970,11 +970,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
NULL);
/* and make the MergeAppend unique */
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(tlist),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(tlist),
+ dNumGroups);
add_path(result_rel, path);
}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a4c5867cdcb..b0da28150d3 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,7 +46,6 @@ typedef enum
*/
#define STD_FUZZ_FACTOR 1.01
-static List *translate_sub_tlist(List *tlist, int relid);
static int append_total_cost_compare(const ListCell *a, const ListCell *b);
static int append_startup_cost_compare(const ListCell *a, const ListCell *b);
static List *reparameterize_pathlist_by_child(PlannerInfo *root,
@@ -381,7 +380,6 @@ set_cheapest(RelOptInfo *parent_rel)
parent_rel->cheapest_startup_path = cheapest_startup_path;
parent_rel->cheapest_total_path = cheapest_total_path;
- parent_rel->cheapest_unique_path = NULL; /* computed only if needed */
parent_rel->cheapest_parameterized_paths = parameterized_paths;
}
@@ -1740,246 +1738,6 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * create_unique_path
- * Creates a path representing elimination of distinct rows from the
- * input data. Distinct-ness is defined according to the needs of the
- * semijoin represented by sjinfo. If it is not possible to identify
- * how to make the data unique, NULL is returned.
- *
- * If used at all, this is likely to be called repeatedly on the same rel;
- * and the input subpath should always be the same (the cheapest_total path
- * for the rel). So we cache the result.
- */
-UniquePath *
-create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- SpecialJoinInfo *sjinfo)
-{
- UniquePath *pathnode;
- Path sort_path; /* dummy for result of cost_sort */
- Path agg_path; /* dummy for result of cost_agg */
- MemoryContext oldcontext;
- int numCols;
-
- /* Caller made a mistake if subpath isn't cheapest_total ... */
- Assert(subpath == rel->cheapest_total_path);
- Assert(subpath->parent == rel);
- /* ... or if SpecialJoinInfo is the wrong one */
- Assert(sjinfo->jointype == JOIN_SEMI);
- Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
-
- /* If result already cached, return it */
- if (rel->cheapest_unique_path)
- return (UniquePath *) rel->cheapest_unique_path;
-
- /* If it's not possible to unique-ify, return NULL */
- if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
- return NULL;
-
- /*
- * When called during GEQO join planning, we are in a short-lived memory
- * context. We must make sure that the path and any subsidiary data
- * structures created for a baserel survive the GEQO cycle, else the
- * baserel is trashed for future GEQO cycles. On the other hand, when we
- * are creating those for a joinrel during GEQO, we don't want them to
- * clutter the main planning context. Upshot is that the best solution is
- * to explicitly allocate memory in the same context the given RelOptInfo
- * is in.
- */
- oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
-
- pathnode = makeNode(UniquePath);
-
- pathnode->path.pathtype = T_Unique;
- pathnode->path.parent = rel;
- pathnode->path.pathtarget = rel->reltarget;
- pathnode->path.param_info = subpath->param_info;
- pathnode->path.parallel_aware = false;
- pathnode->path.parallel_safe = rel->consider_parallel &&
- subpath->parallel_safe;
- pathnode->path.parallel_workers = subpath->parallel_workers;
-
- /*
- * Assume the output is unsorted, since we don't necessarily have pathkeys
- * to represent it. (This might get overridden below.)
- */
- pathnode->path.pathkeys = NIL;
-
- pathnode->subpath = subpath;
-
- /*
- * Under GEQO and when planning child joins, the sjinfo might be
- * short-lived, so we'd better make copies of data structures we extract
- * from it.
- */
- pathnode->in_operators = copyObject(sjinfo->semi_operators);
- pathnode->uniq_exprs = copyObject(sjinfo->semi_rhs_exprs);
-
- /*
- * If the input is a relation and it has a unique index that proves the
- * semi_rhs_exprs are unique, then we don't need to do anything. Note
- * that relation_has_unique_index_for automatically considers restriction
- * clauses for the rel, as well.
- */
- if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
- relation_has_unique_index_for(root, rel, NIL,
- sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
-
- /*
- * If the input is a subquery whose output must be unique already, then we
- * don't need to do anything. The test for uniqueness has to consider
- * exactly which columns we are extracting; for example "SELECT DISTINCT
- * x,y" doesn't guarantee that x alone is distinct. So we cannot check for
- * this optimization unless semi_rhs_exprs consists only of simple Vars
- * referencing subquery outputs. (Possibly we could do something with
- * expressions in the subquery outputs, too, but for now keep it simple.)
- */
- if (rel->rtekind == RTE_SUBQUERY)
- {
- RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
-
- if (query_supports_distinctness(rte->subquery))
- {
- List *sub_tlist_colnos;
-
- sub_tlist_colnos = translate_sub_tlist(sjinfo->semi_rhs_exprs,
- rel->relid);
-
- if (sub_tlist_colnos &&
- query_is_distinct_for(rte->subquery,
- sub_tlist_colnos,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
- }
- }
-
- /* Estimate number of output rows */
- pathnode->path.rows = estimate_num_groups(root,
- sjinfo->semi_rhs_exprs,
- rel->rows,
- NULL,
- NULL);
- numCols = list_length(sjinfo->semi_rhs_exprs);
-
- if (sjinfo->semi_can_btree)
- {
- /*
- * Estimate cost for sort+unique implementation
- */
- cost_sort(&sort_path, root, NIL,
- subpath->disabled_nodes,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width,
- 0.0,
- work_mem,
- -1.0);
-
- /*
- * Charge one cpu_operator_cost per comparison per input tuple. We
- * assume all columns get compared at most of the tuples. (XXX
- * probably this is an overestimate.) This should agree with
- * create_upper_unique_path.
- */
- sort_path.total_cost += cpu_operator_cost * rel->rows * numCols;
- }
-
- if (sjinfo->semi_can_hash)
- {
- /*
- * Estimate the overhead per hashtable entry at 64 bytes (same as in
- * planner.c).
- */
- int hashentrysize = subpath->pathtarget->width + 64;
-
- if (hashentrysize * pathnode->path.rows > get_hash_memory_limit())
- {
- /*
- * We should not try to hash. Hack the SpecialJoinInfo to
- * remember this, in case we come through here again.
- */
- sjinfo->semi_can_hash = false;
- }
- else
- cost_agg(&agg_path, root,
- AGG_HASHED, NULL,
- numCols, pathnode->path.rows,
- NIL,
- subpath->disabled_nodes,
- subpath->startup_cost,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width);
- }
-
- if (sjinfo->semi_can_btree && sjinfo->semi_can_hash)
- {
- if (agg_path.disabled_nodes < sort_path.disabled_nodes ||
- (agg_path.disabled_nodes == sort_path.disabled_nodes &&
- agg_path.total_cost < sort_path.total_cost))
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- pathnode->umethod = UNIQUE_PATH_SORT;
- }
- else if (sjinfo->semi_can_btree)
- pathnode->umethod = UNIQUE_PATH_SORT;
- else if (sjinfo->semi_can_hash)
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- {
- /* we can get here only if we abandoned hashing above */
- MemoryContextSwitchTo(oldcontext);
- return NULL;
- }
-
- if (pathnode->umethod == UNIQUE_PATH_HASH)
- {
- pathnode->path.disabled_nodes = agg_path.disabled_nodes;
- pathnode->path.startup_cost = agg_path.startup_cost;
- pathnode->path.total_cost = agg_path.total_cost;
- }
- else
- {
- pathnode->path.disabled_nodes = sort_path.disabled_nodes;
- pathnode->path.startup_cost = sort_path.startup_cost;
- pathnode->path.total_cost = sort_path.total_cost;
- }
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
-}
-
/*
* create_gather_merge_path
*
@@ -2031,36 +1789,6 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * translate_sub_tlist - get subquery column numbers represented by tlist
- *
- * The given targetlist usually contains only Vars referencing the given relid.
- * Extract their varattnos (ie, the column numbers of the subquery) and return
- * as an integer List.
- *
- * If any of the tlist items is not a simple Var, we cannot determine whether
- * the subquery's uniqueness condition (if any) matches ours, so punt and
- * return NIL.
- */
-static List *
-translate_sub_tlist(List *tlist, int relid)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, tlist)
- {
- Var *var = (Var *) lfirst(l);
-
- if (!var || !IsA(var, Var) ||
- var->varno != relid)
- return NIL; /* punt */
-
- result = lappend_int(result, var->varattno);
- }
- return result;
-}
-
/*
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
@@ -2818,8 +2546,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3074,8 +2801,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3122,8 +2848,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3199,13 +2924,10 @@ create_group_path(PlannerInfo *root,
}
/*
- * create_upper_unique_path
+ * create_unique_path
* Creates a pathnode that represents performing an explicit Unique step
* on presorted input.
*
- * This produces a Unique plan node, but the use-case is so different from
- * create_unique_path that it doesn't seem worth trying to merge the two.
- *
* 'rel' is the parent relation associated with the result
* 'subpath' is the path representing the source of data
* 'numCols' is the number of grouping columns
@@ -3214,21 +2936,20 @@ create_group_path(PlannerInfo *root,
* The input path must be sorted on the grouping columns, plus possibly
* additional columns; so the first numCols pathkeys are the grouping columns
*/
-UpperUniquePath *
-create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups)
+UniquePath *
+create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups)
{
- UpperUniquePath *pathnode = makeNode(UpperUniquePath);
+ UniquePath *pathnode = makeNode(UniquePath);
pathnode->path.pathtype = T_Unique;
pathnode->path.parent = rel;
/* Unique doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3284,8 +3005,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..0e523d2eb5b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -217,7 +217,6 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->partial_pathlist = NIL;
rel->cheapest_startup_path = NULL;
rel->cheapest_total_path = NULL;
- rel->cheapest_unique_path = NULL;
rel->cheapest_parameterized_paths = NIL;
rel->relid = relid;
rel->rtekind = rte->rtekind;
@@ -269,6 +268,9 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->fdw_private = NULL;
rel->unique_for_rels = NIL;
rel->non_unique_for_rels = NIL;
+ rel->unique_rel = NULL;
+ rel->unique_pathkeys = NIL;
+ rel->unique_groupclause = NIL;
rel->baserestrictinfo = NIL;
rel->baserestrictcost.startup = 0;
rel->baserestrictcost.per_tuple = 0;
@@ -713,7 +715,6 @@ build_join_rel(PlannerInfo *root,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
/* init direct_lateral_relids from children; we'll finish it up below */
joinrel->direct_lateral_relids =
@@ -748,6 +749,9 @@ build_join_rel(PlannerInfo *root,
joinrel->fdw_private = NULL;
joinrel->unique_for_rels = NIL;
joinrel->non_unique_for_rels = NIL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -906,7 +910,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
joinrel->direct_lateral_relids = NULL;
joinrel->lateral_relids = NULL;
@@ -933,6 +936,9 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->useridiscurrent = false;
joinrel->fdwroutine = NULL;
joinrel->fdw_private = NULL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -1488,7 +1494,6 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->pathlist = NIL;
upperrel->cheapest_startup_path = NULL;
upperrel->cheapest_total_path = NULL;
- upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fbe333d88fa..e97566b5938 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -319,8 +319,8 @@ typedef enum JoinType
* These codes are used internally in the planner, but are not supported
* by the executor (nor, indeed, by most of the planner).
*/
- JOIN_UNIQUE_OUTER, /* LHS path must be made unique */
- JOIN_UNIQUE_INNER, /* RHS path must be made unique */
+ JOIN_UNIQUE_OUTER, /* LHS has be made unique */
+ JOIN_UNIQUE_INNER, /* RHS has be made unique */
/*
* We might need additional join types someday.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ad2726f026f..51ec5c66b47 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -703,8 +703,6 @@ typedef struct PartitionSchemeData *PartitionScheme;
* (regardless of ordering) among the unparameterized paths;
* or if there is no unparameterized path, the path with lowest
* total cost among the paths with minimum parameterization
- * cheapest_unique_path - for caching cheapest path to produce unique
- * (no duplicates) output from relation; NULL if not yet requested
* cheapest_parameterized_paths - best paths for their parameterizations;
* always includes cheapest_total_path, even if that's unparameterized
* direct_lateral_relids - rels this rel has direct LATERAL references to
@@ -770,6 +768,21 @@ typedef struct PartitionSchemeData *PartitionScheme;
* other rels for which we have tried and failed to prove
* this one unique
*
+ * Three fields are used to cache information about unique-ification of this
+ * relation. This is used to support semijoins where the relation appears on
+ * the RHS: the relation is first unique-ified, and then a regular join is
+ * performed:
+ *
+ * unique_rel - the unique-ified version of the relation, containing paths
+ * that produce unique (no duplicates) output from relation;
+ * NULL if not yet requested
+ * unique_pathkeys - pathkeys that represent the ordering requirements for
+ * the relation's output in sort-based unique-ification
+ * implementations
+ * unique_groupclause - a list of SortGroupClause nodes that represent the
+ * columns to be grouped on in hash-based unique-ification
+ * implementations
+ *
* The presence of the following fields depends on the restrictions
* and joins that the relation participates in:
*
@@ -930,7 +943,6 @@ typedef struct RelOptInfo
List *partial_pathlist; /* partial Paths */
struct Path *cheapest_startup_path;
struct Path *cheapest_total_path;
- struct Path *cheapest_unique_path;
List *cheapest_parameterized_paths;
/*
@@ -1004,6 +1016,16 @@ typedef struct RelOptInfo
/* known not unique for these set(s) */
List *non_unique_for_rels;
+ /*
+ * information about unique-ification of this relation
+ */
+ /* the unique-ified version of the relation */
+ struct RelOptInfo *unique_rel;
+ /* pathkeys for sort-based unique-ification implementations */
+ List *unique_pathkeys;
+ /* SortGroupClause nodes for hash-based unique-ification implementations */
+ List *unique_groupclause;
+
/*
* used by various scans and joins:
*/
@@ -1097,6 +1119,17 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is given relation unique-ified?
+ *
+ * When the nominal jointype is JOIN_INNER, sjinfo->jointype is JOIN_SEMI, and
+ * the given rel is exactly the RHS of the semijoin, it indicates that the rel
+ * has been unique-ified.
+ */
+#define IS_UNIQUEIFIED_REL(rel, sjinfo, nominal_jointype) \
+ ((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
+ bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -1741,8 +1774,8 @@ typedef struct ParamPathInfo
* and the specified outer rel(s).
*
* "rows" is the same as parent->rows in simple paths, but in parameterized
- * paths and UniquePaths it can be less than parent->rows, reflecting the
- * fact that we've filtered by extra join conditions or removed duplicates.
+ * paths it can be less than parent->rows, reflecting the fact that we've
+ * filtered by extra join conditions.
*
* "pathkeys" is a List of PathKey nodes (see above), describing the sort
* ordering of the path's output rows.
@@ -2141,34 +2174,6 @@ typedef struct MemoizePath
double est_hit_ratio; /* estimated cache hit ratio, for EXPLAIN */
} MemoizePath;
-/*
- * UniquePath represents elimination of distinct rows from the output of
- * its subpath.
- *
- * This can represent significantly different plans: either hash-based or
- * sort-based implementation, or a no-op if the input path can be proven
- * distinct already. The decision is sufficiently localized that it's not
- * worth having separate Path node types. (Note: in the no-op case, we could
- * eliminate the UniquePath node entirely and just return the subpath; but
- * it's convenient to have a UniquePath in the path tree to signal upper-level
- * routines that the input is known distinct.)
- */
-typedef enum UniquePathMethod
-{
- UNIQUE_PATH_NOOP, /* input is known unique already */
- UNIQUE_PATH_HASH, /* use hashing */
- UNIQUE_PATH_SORT, /* use sorting */
-} UniquePathMethod;
-
-typedef struct UniquePath
-{
- Path path;
- Path *subpath;
- UniquePathMethod umethod;
- List *in_operators; /* equality operators of the IN clause */
- List *uniq_exprs; /* expressions to be made unique */
-} UniquePath;
-
/*
* GatherPath runs several copies of a plan in parallel and collects the
* results. The parallel leader may also execute the plan, unless the
@@ -2375,17 +2380,17 @@ typedef struct GroupPath
} GroupPath;
/*
- * UpperUniquePath represents adjacent-duplicate removal (in presorted input)
+ * UniquePath represents adjacent-duplicate removal (in presorted input)
*
* The columns to be compared are the first numkeys columns of the path's
* pathkeys. The input is presumed already sorted that way.
*/
-typedef struct UpperUniquePath
+typedef struct UniquePath
{
Path path;
Path *subpath; /* path representing input source */
int numkeys; /* number of pathkey columns to compare */
-} UpperUniquePath;
+} UniquePath;
/*
* AggPath represents generic computation of aggregate functions
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 58936e963cb..763cd25bb3c 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -91,8 +91,6 @@ extern MemoizePath *create_memoize_path(PlannerInfo *root,
bool singlerow,
bool binary_mode,
Cardinality est_calls);
-extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
- Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath, PathTarget *target,
Relids required_outer, double *rows);
@@ -223,11 +221,11 @@ extern GroupPath *create_group_path(PlannerInfo *root,
List *groupClause,
List *qual,
double numGroups);
-extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups);
+extern UniquePath *create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups);
extern AggPath *create_agg_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 347c582a789..f220e9a270d 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,4 +59,7 @@ extern Path *get_cheapest_fractional_path(RelOptInfo *rel,
extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
+extern RelOptInfo *create_unique_paths(PlannerInfo *root, RelOptInfo *rel,
+ SpecialJoinInfo *sjinfo);
+
#endif /* PLANNER_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 4d5d35d0727..98b05c94a11 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -9468,23 +9468,20 @@ where exists (select 1 from tenk1 t3
---------------------------------------------------------------------------------
Nested Loop
Output: t1.unique1, t2.hundred
- -> Hash Join
+ -> Merge Join
Output: t1.unique1, t3.tenthous
- Hash Cond: (t3.thousand = t1.unique1)
- -> HashAggregate
+ Merge Cond: (t3.thousand = t1.unique1)
+ -> Unique
Output: t3.thousand, t3.tenthous
- Group Key: t3.thousand, t3.tenthous
-> Index Only Scan using tenk1_thous_tenthous on public.tenk1 t3
Output: t3.thousand, t3.tenthous
- -> Hash
+ -> Index Only Scan using onek_unique1 on public.onek t1
Output: t1.unique1
- -> Index Only Scan using onek_unique1 on public.onek t1
- Output: t1.unique1
- Index Cond: (t1.unique1 < 1)
+ Index Cond: (t1.unique1 < 1)
-> Index Only Scan using tenk1_hundred on public.tenk1 t2
Output: t2.hundred
Index Cond: (t2.hundred = t3.tenthous)
-(18 rows)
+(15 rows)
-- ... unless it actually is unique
create table j3 as select unique1, tenthous from onek;
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index d5368186caa..24e06845f92 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1134,48 +1134,50 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- Join Filter: (t1_2.a = t1_5.b)
- -> HashAggregate
- Group Key: t1_5.b
+ -> Nested Loop
+ Join Filter: (t1_2.a = t1_5.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_5.b
-> Hash Join
Hash Cond: (((t2_1.a + t2_1.b) / 2) = t1_5.b)
-> Seq Scan on prt1_e_p1 t2_1
-> Hash
-> Seq Scan on prt2_p1 t1_5
Filter: (a = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
- Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_3.a = t1_6.b)
- -> HashAggregate
- Group Key: t1_6.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
+ Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_3.a = t1_6.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Join
Hash Cond: (((t2_2.a + t2_2.b) / 2) = t1_6.b)
-> Seq Scan on prt1_e_p2 t2_2
-> Hash
-> Seq Scan on prt2_p2 t1_6
Filter: (a = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
- Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_4.a = t1_7.b)
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
+ Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_4.a = t1_7.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Nested Loop
-> Seq Scan on prt2_p3 t1_7
Filter: (a = 0)
-> Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t2_3
Index Cond: (((a + b) / 2) = t1_7.b)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
- Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
- Filter: (b = 0)
-(41 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
+ Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
+ Filter: (b = 0)
+(43 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
a | b | c
@@ -1190,46 +1192,48 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_6.b
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Semi Join
Hash Cond: (t1_6.b = ((t1_9.a + t1_9.b) / 2))
-> Seq Scan on prt2_p1 t1_6
-> Hash
-> Seq Scan on prt1_e_p1 t1_9
Filter: (c = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
- Index Cond: (a = t1_6.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
+ Index Cond: (a = t1_6.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Hash Semi Join
Hash Cond: (t1_7.b = ((t1_10.a + t1_10.b) / 2))
-> Seq Scan on prt2_p2 t1_7
-> Hash
-> Seq Scan on prt1_e_p2 t1_10
Filter: (c = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
- Index Cond: (a = t1_7.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_8.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
+ Index Cond: (a = t1_7.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_8.b
-> Hash Semi Join
Hash Cond: (t1_8.b = ((t1_11.a + t1_11.b) / 2))
-> Seq Scan on prt2_p3 t1_8
-> Hash
-> Seq Scan on prt1_e_p3 t1_11
Filter: (c = 0)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
- Index Cond: (a = t1_8.b)
- Filter: (b = 0)
-(39 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
+ Index Cond: (a = t1_8.b)
+ Filter: (b = 0)
+(41 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 18fed63e738..0563d0cd5a1 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -707,6 +707,212 @@ select * from numeric_table
3
(4 rows)
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Index Cond: (t2.a = t3.b)
+(18 rows)
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Incremental Sort
+ Output: t1.a, t1.b, t2.a, t2.b
+ Sort Key: t1.a, t2.a
+ Presorted Key: t1.a
+ -> Merge Join
+ Output: t1.a, t1.b, t2.a, t2.b
+ Merge Cond: (t1.a = ((t3.a + 1)))
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Sort
+ Output: t2.a, t2.b, t3.a, ((t3.a + 1))
+ Sort Key: ((t3.a + 1))
+ -> Hash Join
+ Output: t2.a, t2.b, t3.a, (t3.a + 1)
+ Hash Cond: (t2.a = (t3.b + 1))
+ -> Seq Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ -> Hash
+ Output: t3.a, t3.b
+ -> HashAggregate
+ Output: t3.a, t3.b
+ Group Key: (t3.a + 1), (t3.b + 1)
+ -> Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b, (t3.a + 1), (t3.b + 1)
+(24 rows)
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+set enable_indexscan to off;
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Gather Merge
+ Output: t3.a, t3.b
+ Workers Planned: 2
+ -> Sort
+ Output: t3.a, t3.b
+ Sort Key: t3.a, t3.b
+ -> HashAggregate
+ Output: t3.a, t3.b
+ Group Key: t3.a, t3.b
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Materialize
+ Output: t1.a, t1.b
+ -> Gather Merge
+ Output: t1.a, t1.b
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, t1.b
+ Sort Key: t1.a
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Bitmap Heap Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Recheck Cond: (t2.a = t3.b)
+ -> Bitmap Index Scan on semijoin_unique_tbl_a_b_idx
+ Index Cond: (t2.a = t3.b)
+(37 rows)
+
+reset enable_indexscan;
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+drop table semijoin_unique_tbl;
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+set enable_partitionwise_join to on;
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: t1.a
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t2_1.a, t2_1.b
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t3_1.a
+ -> Unique
+ Output: t3_1.a
+ -> Index Only Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t3_1
+ Output: t3_1.a
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t1_1
+ Output: t1_1.a, t1_1.b
+ Index Cond: (t1_1.a = t3_1.a)
+ -> Memoize
+ Output: t2_1.a, t2_1.b
+ Cache Key: t1_1.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t2_1
+ Output: t2_1.a, t2_1.b
+ Index Cond: (t2_1.a = t1_1.a)
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t2_2.a, t2_2.b
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t3_2.a
+ -> Unique
+ Output: t3_2.a
+ -> Index Only Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t3_2
+ Output: t3_2.a
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t1_2
+ Output: t1_2.a, t1_2.b
+ Index Cond: (t1_2.a = t3_2.a)
+ -> Memoize
+ Output: t2_2.a, t2_2.b
+ Cache Key: t1_2.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t2_2
+ Output: t2_2.a, t2_2.b
+ Index Cond: (t2_2.a = t1_2.a)
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t2_3.a, t2_3.b
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t3_3.a
+ -> Unique
+ Output: t3_3.a
+ -> Sort
+ Output: t3_3.a
+ Sort Key: t3_3.a
+ -> Seq Scan on public.unique_tbl_p3 t3_3
+ Output: t3_3.a
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t1_3
+ Output: t1_3.a, t1_3.b
+ Index Cond: (t1_3.a = t3_3.a)
+ -> Memoize
+ Output: t2_3.a, t2_3.b
+ Cache Key: t1_3.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t2_3
+ Output: t2_3.a, t2_3.b
+ Index Cond: (t2_3.a = t1_3.a)
+(59 rows)
+
+reset enable_partitionwise_join;
+drop table unique_tbl_p;
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
@@ -2672,18 +2878,17 @@ EXPLAIN (COSTS OFF)
SELECT * FROM onek
WHERE (unique1,ten) IN (VALUES (1,1), (20,0), (99,9), (17,99))
ORDER BY unique1;
- QUERY PLAN
------------------------------------------------------------------
- Sort
- Sort Key: onek.unique1
- -> Nested Loop
- -> HashAggregate
- Group Key: "*VALUES*".column1, "*VALUES*".column2
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: "*VALUES*".column1, "*VALUES*".column2
-> Values Scan on "*VALUES*"
- -> Index Scan using onek_unique1 on onek
- Index Cond: (unique1 = "*VALUES*".column1)
- Filter: ("*VALUES*".column2 = ten)
-(9 rows)
+ -> Index Scan using onek_unique1 on onek
+ Index Cond: (unique1 = "*VALUES*".column1)
+ Filter: ("*VALUES*".column2 = ten)
+(8 rows)
EXPLAIN (COSTS OFF)
SELECT * FROM onek
@@ -2858,12 +3063,10 @@ SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) ORDER BY 1);
-> Unique
-> Sort
Sort Key: "*VALUES*".column1
- -> Sort
- Sort Key: "*VALUES*".column1
- -> Values Scan on "*VALUES*"
+ -> Values Scan on "*VALUES*"
-> Index Scan using onek_unique1 on onek
Index Cond: (unique1 = "*VALUES*".column1)
-(9 rows)
+(7 rows)
EXPLAIN (COSTS OFF)
SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) LIMIT 1);
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index d9a841fbc9f..a6d276a115b 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -361,6 +361,73 @@ select * from float_table
select * from numeric_table
where num_col in (select float_col from float_table);
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+
+set enable_indexscan to off;
+
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+reset enable_indexscan;
+
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+
+drop table semijoin_unique_tbl;
+
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+
+set enable_partitionwise_join to on;
+
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+
+reset enable_partitionwise_join;
+
+drop table unique_tbl_p;
+
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..e4a9ec65ab4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3159,7 +3159,6 @@ UnicodeNormalizationForm
UnicodeNormalizationQC
Unique
UniquePath
-UniquePathMethod
UniqueRelInfo
UniqueState
UnlistenStmt
@@ -3175,7 +3174,6 @@ UpgradeTaskSlotState
UpgradeTaskStep
UploadManifestCmd
UpperRelationKind
-UpperUniquePath
UserAuth
UserContext
UserMapping
--
2.43.0
v6-0002-Simplify-relation_has_unique_index_for.patchapplication/octet-stream; name=v6-0002-Simplify-relation_has_unique_index_for.patchDownload
From b1d583367e7288c7b4ddd7122954035880864c13 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 1 Aug 2025 18:12:30 +0900
Subject: [PATCH v6 2/2] Simplify relation_has_unique_index_for()
Now that the only call to relation_has_unique_index_for() that
supplied an exprlist and oprlist has been removed, the loop handling
those lists is effectively dead code. This patch removes that loop
and simplifies the function accordingly.
---
src/backend/optimizer/path/indxpath.c | 85 ++++-------------------
src/backend/optimizer/plan/analyzejoins.c | 5 +-
src/include/optimizer/paths.h | 5 +-
3 files changed, 17 insertions(+), 78 deletions(-)
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 601354ea3e0..4f5c98f0091 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4142,47 +4142,26 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* a set of equality conditions, because the conditions constrain all
* columns of some unique index.
*
- * The conditions can be represented in either or both of two ways:
- * 1. A list of RestrictInfo nodes, where the caller has already determined
- * that each condition is a mergejoinable equality with an expression in
- * this relation on one side, and an expression not involving this relation
- * on the other. The transient outer_is_left flag is used to identify which
- * side we should look at: left side if outer_is_left is false, right side
- * if it is true.
- * 2. A list of expressions in this relation, and a corresponding list of
- * equality operators. The caller must have already checked that the operators
- * represent equality. (Note: the operators could be cross-type; the
- * expressions should correspond to their RHS inputs.)
+ * The conditions are provided as a list of RestrictInfo nodes, where the
+ * caller has already determined that each condition is a mergejoinable
+ * equality with an expression in this relation on one side, and an
+ * expression not involving this relation on the other. The transient
+ * outer_is_left flag is used to identify which side we should look at:
+ * left side if outer_is_left is false, right side if it is true.
*
* The caller need only supply equality conditions arising from joins;
* this routine automatically adds in any usable baserestrictinfo clauses.
* (Note that the passed-in restrictlist will be destructively modified!)
+ *
+ * If extra_clauses isn't NULL, return baserestrictinfo clauses which were used
+ * to derive uniqueness.
*/
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
- List *restrictlist,
- List *exprlist, List *oprlist)
-{
- return relation_has_unique_index_ext(root, rel, restrictlist,
- exprlist, oprlist, NULL);
-}
-
-/*
- * relation_has_unique_index_ext
- * Same as relation_has_unique_index_for(), but supports extra_clauses
- * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
- * which were used to derive uniqueness.
- */
-bool
-relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
- List *restrictlist,
- List *exprlist, List *oprlist,
- List **extra_clauses)
+ List *restrictlist, List **extra_clauses)
{
ListCell *ic;
- Assert(list_length(exprlist) == list_length(oprlist));
-
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -4225,7 +4204,7 @@ relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
}
/* Short-circuit the easy case */
- if (restrictlist == NIL && exprlist == NIL)
+ if (restrictlist == NIL)
return false;
/* Examine each index of the relation ... */
@@ -4247,14 +4226,12 @@ relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
continue;
/*
- * Try to find each index column in the lists of conditions. This is
+ * Try to find each index column in the list of conditions. This is
* O(N^2) or worse, but we expect all the lists to be short.
*/
for (c = 0; c < ind->nkeycolumns; c++)
{
- bool matched = false;
ListCell *lc;
- ListCell *lc2;
foreach(lc, restrictlist)
{
@@ -4284,8 +4261,6 @@ relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
- matched = true; /* column is unique */
-
if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
{
MemoryContext oldMemCtx =
@@ -4303,43 +4278,11 @@ relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
MemoryContextSwitchTo(oldMemCtx);
}
- break;
+ break; /* found a match; column is unique */
}
}
- if (matched)
- continue;
-
- forboth(lc, exprlist, lc2, oprlist)
- {
- Node *expr = (Node *) lfirst(lc);
- Oid opr = lfirst_oid(lc2);
-
- /* See if the expression matches the index key */
- if (!match_index_to_operand(expr, c, ind))
- continue;
-
- /*
- * The equality operator must be a member of the index
- * opfamily, else it is not asserting the right kind of
- * equality behavior for this index. We assume the caller
- * determined it is an equality operator, so we don't need to
- * check any more tightly than this.
- */
- if (!op_in_opfamily(opr, ind->opfamily[c]))
- continue;
-
- /*
- * XXX at some point we may need to check collations here too.
- * For the moment we assume all collations reduce to the same
- * notion of equality.
- */
-
- matched = true; /* column is unique */
- break;
- }
-
- if (!matched)
+ if (lc == NULL)
break; /* no match; this index doesn't help us */
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 4d55c2ea591..da92d8ee414 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -990,11 +990,10 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_ext automatically adds any usable
+ * relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
- extra_clauses))
+ if (relation_has_unique_index_for(root, rel, clause_list, extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8410531f2d6..cbade77b717 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,10 +71,7 @@ extern void generate_partitionwise_join_paths(PlannerInfo *root,
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
-extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
- List *restrictlist, List *exprlist,
- List *oprlist, List **extra_clauses);
+ List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
--
2.43.0
HI Richard Guo
+/*
+ * Is given relation unique-ified?
+ *
+ * When the nominal jointype is JOIN_INNER, sjinfo->jointype is JOIN_SEMI,
and
+ * the given rel is exactly the RHS of the semijoin, it indicates that the
rel
+ * has been unique-ified.
+ */
+#define IS_UNIQUEIFIED_REL(rel, sjinfo, nominal_jointype) \
+ ((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
+ bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+
In light of this commit (
https://github.com/postgres/postgres/commit/e035863c9a04beeecc254c3bfe48dab58e389e10),
I also recommend changing the macro to a static inline function. Macros are
harder to debug and lack type safety.
static inline bool
is_uniqueified_rel(RelOptInfo *rel, SpecialJoinInfo *sjinfo, JoinType
nominal_jointype)
{
return nominal_jointype == JOIN_INNER &&
sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel->relids);
}
Thanks
On Mon, Aug 4, 2025 at 10:08 AM Richard Guo <guofenglinux@gmail.com> wrote:
Show quoted text
The v5 patch does not apply anymore, and here is a new rebase. There
are two main changes in v6:* I choose to use the check I proposed earlier to determine whether a
relation has been unique-ified in costsize.c.* Now that the only call to relation_has_unique_index_for() that
supplied an exprlist and oprlist has been removed, the loop handling
those lists is effectively dead code. 0002 removes that loop and
simplifies the function accordingly.Thanks
Richard
On Thu, Aug 7, 2025 at 6:04 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
In light of this commit (https://github.com/postgres/postgres/commit/e035863c9a04beeecc254c3bfe48dab58e389e10), I also recommend changing the macro to a static inline function. Macros are harder to debug and lack type safety.
I'm inclined not to do that. We already have other macros for
checking whether a relation is of a certain kind, and I'd prefer to
keep the new check consistent with those.
Thanks
Richard
Hi Richard
Thanks for your feedback. I agree this approach is better for keeping
the code style consistent.
Thanks
On Fri, Aug 8, 2025 at 9:39 AM Richard Guo <guofenglinux@gmail.com> wrote:
Show quoted text
On Thu, Aug 7, 2025 at 6:04 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
In light of this commit (
https://github.com/postgres/postgres/commit/e035863c9a04beeecc254c3bfe48dab58e389e10),
I also recommend changing the macro to a static inline function. Macros are
harder to debug and lack type safety.I'm inclined not to do that. We already have other macros for
checking whether a relation is of a certain kind, and I'd prefer to
keep the new check consistent with those.Thanks
Richard
On Mon, Aug 4, 2025 at 11:08 AM Richard Guo <guofenglinux@gmail.com> wrote:
The v5 patch does not apply anymore, and here is a new rebase. There
are two main changes in v6:* I choose to use the check I proposed earlier to determine whether a
relation has been unique-ified in costsize.c.* Now that the only call to relation_has_unique_index_for() that
supplied an exprlist and oprlist has been removed, the loop handling
those lists is effectively dead code. 0002 removes that loop and
simplifies the function accordingly.
Does anyone plan to review this patch further? I intend to push it in
two weeks unless there are any objections or additional comments.
Thanks
Richard
On 2025-Aug-12, Richard Guo wrote:
Does anyone plan to review this patch further? I intend to push it in
two weeks unless there are any objections or additional comments.
No review, but apparently "uniquify" is more widely accepted than
"uniqueify". No dictionary lists either words AFAICS, except that
Wiktionary lists the former:
https://en.wiktionary.org/wiki/uniquify
but apparently adjectives ending in "-e" are prone to lose it when a
verb is formed from them with "-ify", such as
https://www.merriam-webster.com/dictionary/falsify
https://www.merriam-webster.com/dictionary/intensify
https://www.merriam-webster.com/dictionary/simplify
There aren't many though. Most in this list don't end in -e:
https://en.wiktionary.org/w/index.php?title=Category:English_terms_suffixed_with_-ify
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"El destino baraja y nosotros jugamos" (A. Schopenhauer)
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
No review, but apparently "uniquify" is more widely accepted than
"uniqueify".
Personally I'd write "unique-ify", seeing that neither of the forms
without the dash are considered good English. Of course, if you
need to make identifiers out of this, that solution doesn't work;
but you could just avoid the construction --- say, "make_path_unique"
rather than "uniquify_path".
regards, tom lane
On Wed, Aug 13, 2025 at 1:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:
No review, but apparently "uniquify" is more widely accepted than
"uniqueify".
Personally I'd write "unique-ify", seeing that neither of the forms
without the dash are considered good English. Of course, if you
need to make identifiers out of this, that solution doesn't work;
but you could just avoid the construction --- say, "make_path_unique"
rather than "uniquify_path".
Some 'git grep' work shows that, currently on master, we commonly use
the form "unique-ify" (with the dash) and its variants, such as:
unique-ify, unique-ified, unique-ification, and unique-ifying.
$ git grep -in 'unique-if' | wc -l
50
There is one instance of the form "uniquify":
planner.c:5107: * check). We can uniquify these tuples simply by just taking
And one instance of "uniqueify" (without the dash):
jsonb_util.c:65:static void uniqueifyJsonbObject()
Given this, I'd prefer to stick with "unique-ify", for consistency
with the majority usage in the codebase.
In this patch, the only instance that doesn't follow the "unique-ify"
form is the macro IS_UNIQUEIFIED_REL, as dashes are not allowed in C
identifiers. Maybe a better alternative is IS_RELATION_UNIQUE? Any
suggestions?
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
Given this, I'd prefer to stick with "unique-ify", for consistency
with the majority usage in the codebase.
+1. (Not but what I might've been responsible for many of the
existing usages, so my opinion is perhaps counting twice here.)
In this patch, the only instance that doesn't follow the "unique-ify"
form is the macro IS_UNIQUEIFIED_REL, as dashes are not allowed in C
identifiers. Maybe a better alternative is IS_RELATION_UNIQUE? Any
suggestions?
Hm ... to my ear, "unique-ified" implies that we took some positive
action to make the path's output unique, such as running it through
a hashagg or Unique node. IS_RELATION_UNIQUE only implies that the
output is unique, so for example a scan of a primary key should
satisfy such a predicate. Not having read the patch (I do hope
to get to that), I'm not sure which connotation you have in mind.
If it's the latter, IS_RELATION_UNIQUE seems like a fine name.
If it's the former, maybe something like "RELATION_WAS_MADE_UNIQUE"?
That's not very pretty though ...
regards, tom lane
On Wed, Aug 13, 2025 at 11:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
In this patch, the only instance that doesn't follow the "unique-ify"
form is the macro IS_UNIQUEIFIED_REL, as dashes are not allowed in C
identifiers. Maybe a better alternative is IS_RELATION_UNIQUE? Any
suggestions?
Hm ... to my ear, "unique-ified" implies that we took some positive
action to make the path's output unique, such as running it through
a hashagg or Unique node. IS_RELATION_UNIQUE only implies that the
output is unique, so for example a scan of a primary key should
satisfy such a predicate. Not having read the patch (I do hope
to get to that), I'm not sure which connotation you have in mind.
If it's the latter, IS_RELATION_UNIQUE seems like a fine name.
If it's the former, maybe something like "RELATION_WAS_MADE_UNIQUE"?
That's not very pretty though ...
It's the former: this macro is to signal that we've explicitly taken
steps to make the output of the relation unique. IMO, "unique-ified"
best describes this, but we cannot use it directly in the macro name
because of the dash.
Hmm, I think "RELATION_WAS_MADE_UNIQUE" works well because it clearly
conveys that the relation has been explicitly unique-ified. It's a
bit verbose, but I found that we have similar names in our codebase,
such as VAC_BLK_WAS_EAGER_SCANNED.
Thanks
Richard
On Tue, Aug 12, 2025 at 10:43 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Mon, Aug 4, 2025 at 11:08 AM Richard Guo <guofenglinux@gmail.com> wrote:
The v5 patch does not apply anymore, and here is a new rebase. There
are two main changes in v6:* I choose to use the check I proposed earlier to determine whether a
relation has been unique-ified in costsize.c.* Now that the only call to relation_has_unique_index_for() that
supplied an exprlist and oprlist has been removed, the loop handling
those lists is effectively dead code. 0002 removes that loop and
simplifies the function accordingly.
Does anyone plan to review this patch further? I intend to push it in
two weeks unless there are any objections or additional comments.
Here's the updated version of the patch, which renames the macro
IS_UNIQUEIFIED_REL to RELATION_WAS_MADE_UNIQUE, and includes some
comment updates as well. I plan to push it soon, barring any
objections.
This patch removes the last call to make_sort_from_sortclauses(), so
I'm wondering if we can safely remove the function itself. Or should
we keep it around in case it's used by extensions or might be needed
in the future?
Thanks
Richard
Attachments:
v7-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchapplication/octet-stream; name=v7-0001-Pathify-RHS-unique-ification-for-semijoin-plannin.patchDownload
From 3a751e18a4a0273b88e98513931fe560c51014d1 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 21 May 2025 12:32:29 +0900
Subject: [PATCH v7 1/2] Pathify RHS unique-ification for semijoin planning
There are two implementation techniques for semijoins: one uses the
JOIN_SEMI jointype, where the executor emits at most one matching row
per left-hand side (LHS) row; the other unique-ifies the right-hand
side (RHS) and then performs a plain inner join.
The latter technique currently has some drawbacks related to the
unique-ification step.
* Only the cheapest-total path of the RHS is considered during
unique-ification. This may cause us to miss some optimization
opportunities; for example, a path with a better sort order might be
overlooked simply because it is not the cheapest in total cost. Such
a path could help avoid a sort at a higher level, potentially
resulting in a cheaper overall plan.
* We currently rely on heuristics to choose between hash-based and
sort-based unique-ification. A better approach would be to generate
paths for both methods and allow add_path() to decide which one is
preferable, consistent with how path selection is handled elsewhere in
the planner.
* In the sort-based implementation, we currently pay no attention to
the pathkeys of the input subpath or the resulting output. This can
result in redundant sort nodes being added to the final plan.
This patch improves semijoin planning by creating a new RelOptInfo for
the RHS rel to represent its unique-ified version. It then generates
multiple paths that represent elimination of distinct rows from the
RHS, considering both a hash-based implementation using the cheapest
total path of the original RHS rel, and sort-based implementations
that either exploit presorted input paths or explicitly sort the
cheapest total path. All resulting paths compete in add_path(), and
those deemed worthy of consideration are added to the new RelOptInfo.
Finally, the unique-ified rel is joined with the other side of the
semijoin using a plain inner join.
As a side effect, most of the code related to the JOIN_UNIQUE_OUTER
and JOIN_UNIQUE_INNER jointypes -- used to indicate that the LHS or
RHS path should be made unique -- has been removed. Besides, the
T_Unique path now has the same meaning for both semijoins and upper
DISTINCT clauses: it represents adjacent-duplicate removal on
presorted input. This patch unifies their handling by sharing the
same data structures and functions.
This patch also removes the UNIQUE_PATH_NOOP related code along the
way, as it is dead code -- if the RHS rel is provably unique, the
semijoin should have already been simplified to a plain inner join by
analyzejoins.c.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-EBnaRvEs7frTLbsXiweSTUXifsteF-d3rvv01FKO86w@mail.gmail.com
---
src/backend/optimizer/README | 3 +-
src/backend/optimizer/path/costsize.c | 11 +-
src/backend/optimizer/path/joinpath.c | 338 ++++--------
src/backend/optimizer/path/joinrels.c | 18 +-
src/backend/optimizer/plan/createplan.c | 301 +----------
src/backend/optimizer/plan/planner.c | 518 ++++++++++++++++++-
src/backend/optimizer/prep/prepunion.c | 30 +-
src/backend/optimizer/util/pathnode.c | 306 +----------
src/backend/optimizer/util/relnode.c | 13 +-
src/include/nodes/nodes.h | 4 +-
src/include/nodes/pathnodes.h | 77 +--
src/include/optimizer/pathnode.h | 12 +-
src/include/optimizer/planner.h | 3 +
src/test/regress/expected/join.out | 15 +-
src/test/regress/expected/partition_join.out | 94 ++--
src/test/regress/expected/subselect.out | 233 ++++++++-
src/test/regress/sql/subselect.sql | 67 +++
src/tools/pgindent/typedefs.list | 2 -
18 files changed, 1074 insertions(+), 971 deletions(-)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..843368096fd 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -640,7 +640,6 @@ RelOptInfo - a relation or joined relations
GroupResultPath - childless Result plan node (used for degenerate grouping)
MaterialPath - a Material plan node
MemoizePath - a Memoize plan node for caching tuples from sub-paths
- UniquePath - remove duplicate rows (either by hashing or sorting)
GatherPath - collect the results of parallel workers
GatherMergePath - collect parallel results, preserving their common sort order
ProjectionPath - a Result plan node with child (used for projection)
@@ -648,7 +647,7 @@ RelOptInfo - a relation or joined relations
SortPath - a Sort plan node applied to some sub-path
IncrementalSortPath - an IncrementalSort plan node applied to some sub-path
GroupPath - a Group plan node applied to some sub-path
- UpperUniquePath - a Unique plan node applied to some sub-path
+ UniquePath - a Unique plan node applied to some sub-path
AggPath - an Agg plan node applied to some sub-path
GroupingSetsPath - an Agg plan node used to implement GROUPING SETS
MinMaxAggPath - a Result plan node with subplans performing MIN/MAX
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 344a3188317..783dca8a4ac 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -3966,10 +3966,12 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
* when we should not. Can we do better without expensive selectivity
* computations?
*
- * The whole issue is moot if we are working from a unique-ified outer
- * input, or if we know we don't need to mark/restore at all.
+ * The whole issue is moot if we know we don't need to mark/restore at
+ * all, or if we are working from a unique-ified outer input.
*/
- if (IsA(outer_path, UniquePath) || path->skip_mark_restore)
+ if (path->skip_mark_restore ||
+ RELATION_WAS_MADE_UNIQUE(outer_path->parent, extra->sjinfo,
+ path->jpath.jointype))
rescannedtuples = 0;
else
{
@@ -4364,7 +4366,8 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
* because we avoid contaminating the cache with a value that's wrong for
* non-unique-ified paths.
*/
- if (IsA(inner_path, UniquePath))
+ if (RELATION_WAS_MADE_UNIQUE(inner_path->parent, extra->sjinfo,
+ path->jpath.jointype))
{
innerbucketsize = 1.0 / virtualbuckets;
innermcvfreq = 0.0;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index ebedc5574ca..3b9407eb2eb 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -112,12 +112,12 @@ static void generate_mergejoin_paths(PlannerInfo *root,
* "flipped around" if we are considering joining the rels in the opposite
* direction from what's indicated in sjinfo.
*
- * Also, this routine and others in this module accept the special JoinTypes
- * JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that we should
- * unique-ify the outer or inner relation and then apply a regular inner
- * join. These values are not allowed to propagate outside this module,
- * however. Path cost estimation code may need to recognize that it's
- * dealing with such a case --- the combination of nominal jointype INNER
+ * Also, this routine accepts the special JoinTypes JOIN_UNIQUE_OUTER and
+ * JOIN_UNIQUE_INNER to indicate that the outer or inner relation has been
+ * unique-ified and a regular inner join should then be applied. These values
+ * are not allowed to propagate outside this routine, however. Path cost
+ * estimation code, as well as match_unsorted_outer, may need to recognize that
+ * it's dealing with such a case --- the combination of nominal jointype INNER
* with sjinfo->jointype == JOIN_SEMI indicates that.
*/
void
@@ -129,6 +129,7 @@ add_paths_to_joinrel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
List *restrictlist)
{
+ JoinType save_jointype = jointype;
JoinPathExtraData extra;
bool mergejoin_allowed = true;
ListCell *lc;
@@ -165,10 +166,10 @@ add_paths_to_joinrel(PlannerInfo *root,
* reduce_unique_semijoins would've simplified it), so there's no point in
* calling innerrel_is_unique. However, if the LHS covers all of the
* semijoin's min_lefthand, then it's appropriate to set inner_unique
- * because the path produced by create_unique_path will be unique relative
- * to the LHS. (If we have an LHS that's only part of the min_lefthand,
- * that is *not* true.) For JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid
- * letting that value escape this module.
+ * because the unique relation produced by create_unique_paths will be
+ * unique relative to the LHS. (If we have an LHS that's only part of the
+ * min_lefthand, that is *not* true.) For JOIN_UNIQUE_OUTER, pass
+ * JOIN_INNER to avoid letting that value escape this module.
*/
switch (jointype)
{
@@ -199,6 +200,13 @@ add_paths_to_joinrel(PlannerInfo *root,
break;
}
+ /*
+ * If the outer or inner relation has been unique-ified, handle as a plain
+ * inner join.
+ */
+ if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
+ jointype = JOIN_INNER;
+
/*
* Find potential mergejoin clauses. We can skip this if we are not
* interested in doing a mergejoin. However, mergejoin may be our only
@@ -329,7 +337,7 @@ add_paths_to_joinrel(PlannerInfo *root,
joinrel->fdwroutine->GetForeignJoinPaths)
joinrel->fdwroutine->GetForeignJoinPaths(root, joinrel,
outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
/*
* 6. Finally, give extensions a chance to manipulate the path list. They
@@ -339,7 +347,7 @@ add_paths_to_joinrel(PlannerInfo *root,
*/
if (set_join_pathlist_hook)
set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
- jointype, &extra);
+ save_jointype, &extra);
}
/*
@@ -1364,7 +1372,6 @@ sort_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *outer_path;
Path *inner_path;
Path *cheapest_partial_outer = NULL;
@@ -1402,38 +1409,16 @@ sort_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(inner_path, outerrel))
return;
- /*
- * If unique-ification is requested, do it and then handle as a plain
- * inner join.
- */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- outer_path = (Path *) create_unique_path(root, outerrel,
- outer_path, extra->sjinfo);
- Assert(outer_path);
- jointype = JOIN_INNER;
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- inner_path = (Path *) create_unique_path(root, innerrel,
- inner_path, extra->sjinfo);
- Assert(inner_path);
- jointype = JOIN_INNER;
- }
-
/*
* If the joinrel is parallel-safe, we may be able to consider a partial
- * merge join. However, we can't handle JOIN_UNIQUE_OUTER, because the
- * outer path will be partial, and therefore we won't be able to properly
- * guarantee uniqueness. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT
- * and JOIN_RIGHT_ANTI, because they can produce false null extended rows.
+ * merge join. However, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * JOIN_RIGHT_ANTI, because they can produce false null extended rows.
* Also, the resulting path must not be parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -1441,7 +1426,7 @@ sort_inner_and_outer(PlannerInfo *root,
if (inner_path->parallel_safe)
cheapest_safe_inner = inner_path;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
@@ -1580,13 +1565,9 @@ generate_mergejoin_paths(PlannerInfo *root,
List *trialsortkeys;
Path *cheapest_startup_inner;
Path *cheapest_total_inner;
- JoinType save_jointype = jointype;
int num_sortkeys;
int sortkeycnt;
- if (jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/* Look for useful mergeclauses (if any) */
mergeclauses =
find_mergeclauses_for_outer_pathkeys(root,
@@ -1636,10 +1617,6 @@ generate_mergejoin_paths(PlannerInfo *root,
extra,
is_partial);
- /* Can't do anything else if inner path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
/*
* Look for presorted inner paths that satisfy the innersortkey list ---
* or any truncation thereof, if we are allowed to build a mergejoin using
@@ -1819,7 +1796,6 @@ match_unsorted_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool nestjoinOK;
bool useallclauses;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
@@ -1855,12 +1831,6 @@ match_unsorted_outer(PlannerInfo *root,
nestjoinOK = false;
useallclauses = true;
break;
- case JOIN_UNIQUE_OUTER:
- case JOIN_UNIQUE_INNER:
- jointype = JOIN_INNER;
- nestjoinOK = true;
- useallclauses = false;
- break;
default:
elog(ERROR, "unrecognized join type: %d",
(int) jointype);
@@ -1873,24 +1843,20 @@ match_unsorted_outer(PlannerInfo *root,
* If inner_cheapest_total is parameterized by the outer rel, ignore it;
* we will consider it below as a member of cheapest_parameterized_paths,
* but the other possibilities considered in this routine aren't usable.
+ *
+ * Furthermore, if the inner side is a unique-ified relation, we cannot
+ * generate any valid paths here, because the inner rel's dependency on
+ * the outer rel makes unique-ification meaningless.
*/
if (PATH_PARAM_BY_REL(inner_cheapest_total, outerrel))
+ {
inner_cheapest_total = NULL;
- /*
- * If we need to unique-ify the inner path, we will consider only the
- * cheapest-total inner.
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /* No way to do this with an inner path parameterized by outer rel */
- if (inner_cheapest_total == NULL)
+ if (RELATION_WAS_MADE_UNIQUE(innerrel, extra->sjinfo, jointype))
return;
- inner_cheapest_total = (Path *)
- create_unique_path(root, innerrel, inner_cheapest_total, extra->sjinfo);
- Assert(inner_cheapest_total);
}
- else if (nestjoinOK)
+
+ if (nestjoinOK)
{
/*
* Consider materializing the cheapest inner path, unless
@@ -1914,20 +1880,6 @@ match_unsorted_outer(PlannerInfo *root,
if (PATH_PARAM_BY_REL(outerpath, innerrel))
continue;
- /*
- * If we need to unique-ify the outer path, it's pointless to consider
- * any but the cheapest outer. (XXX we don't consider parameterized
- * outers, nor inners, for unique-ified cases. Should we?)
- */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- {
- if (outerpath != outerrel->cheapest_total_path)
- continue;
- outerpath = (Path *) create_unique_path(root, outerrel,
- outerpath, extra->sjinfo);
- Assert(outerpath);
- }
-
/*
* The result will have this sort order (even if it is implemented as
* a nestloop, and even if some of the mergeclauses are implemented by
@@ -1936,21 +1888,7 @@ match_unsorted_outer(PlannerInfo *root,
merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
outerpath->pathkeys);
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- /*
- * Consider nestloop join, but only with the unique-ified cheapest
- * inner path
- */
- try_nestloop_path(root,
- joinrel,
- outerpath,
- inner_cheapest_total,
- merge_pathkeys,
- jointype,
- extra);
- }
- else if (nestjoinOK)
+ if (nestjoinOK)
{
/*
* Consider nestloop joins using this outer path and various
@@ -2001,17 +1939,13 @@ match_unsorted_outer(PlannerInfo *root,
extra);
}
- /* Can't do anything else if outer path needs to be unique'd */
- if (save_jointype == JOIN_UNIQUE_OUTER)
- continue;
-
/* Can't do anything else if inner rel is parameterized by outer */
if (inner_cheapest_total == NULL)
continue;
/* Generate merge join paths */
generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
- save_jointype, extra, useallclauses,
+ jointype, extra, useallclauses,
inner_cheapest_total, merge_pathkeys,
false);
}
@@ -2019,41 +1953,35 @@ match_unsorted_outer(PlannerInfo *root,
/*
* Consider partial nestloop and mergejoin plan if outerrel has any
* partial path and the joinrel is parallel-safe. However, we can't
- * handle JOIN_UNIQUE_OUTER, because the outer path will be partial, and
- * therefore we won't be able to properly guarantee uniqueness. Nor can
- * we handle joins needing lateral rels, since partial paths must not be
- * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
+ * handle joins needing lateral rels, since partial paths must not be
+ * parameterized. Similarly, we can't handle JOIN_FULL, JOIN_RIGHT and
* JOIN_RIGHT_ANTI, because they can produce false null extended rows.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
- save_jointype != JOIN_FULL &&
- save_jointype != JOIN_RIGHT &&
- save_jointype != JOIN_RIGHT_ANTI &&
+ jointype != JOIN_FULL &&
+ jointype != JOIN_RIGHT &&
+ jointype != JOIN_RIGHT_ANTI &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
if (nestjoinOK)
consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
- save_jointype, extra);
+ jointype, extra);
/*
* If inner_cheapest_total is NULL or non parallel-safe then find the
- * cheapest total parallel safe path. If doing JOIN_UNIQUE_INNER, we
- * can't use any alternative inner path.
+ * cheapest total parallel safe path.
*/
if (inner_cheapest_total == NULL ||
!inner_cheapest_total->parallel_safe)
{
- if (save_jointype == JOIN_UNIQUE_INNER)
- return;
-
- inner_cheapest_total = get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
+ inner_cheapest_total =
+ get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
}
if (inner_cheapest_total)
consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
- save_jointype, extra,
+ jointype, extra,
inner_cheapest_total);
}
}
@@ -2118,24 +2046,17 @@ consider_parallel_nestloop(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
Path *inner_cheapest_total = innerrel->cheapest_total_path;
Path *matpath = NULL;
ListCell *lc1;
- if (jointype == JOIN_UNIQUE_INNER)
- jointype = JOIN_INNER;
-
/*
- * Consider materializing the cheapest inner path, unless: 1) we're doing
- * JOIN_UNIQUE_INNER, because in this case we have to unique-ify the
- * cheapest inner path, 2) enable_material is off, 3) the cheapest inner
- * path is not parallel-safe, 4) the cheapest inner path is parameterized
- * by the outer rel, or 5) the cheapest inner path materializes its output
- * anyway.
+ * Consider materializing the cheapest inner path, unless: 1)
+ * enable_material is off, 2) the cheapest inner path is not
+ * parallel-safe, 3) the cheapest inner path is parameterized by the outer
+ * rel, or 4) the cheapest inner path materializes its output anyway.
*/
- if (save_jointype != JOIN_UNIQUE_INNER &&
- enable_material && inner_cheapest_total->parallel_safe &&
+ if (enable_material && inner_cheapest_total->parallel_safe &&
!PATH_PARAM_BY_REL(inner_cheapest_total, outerrel) &&
!ExecMaterializesOutput(inner_cheapest_total->pathtype))
{
@@ -2169,23 +2090,6 @@ consider_parallel_nestloop(PlannerInfo *root,
if (!innerpath->parallel_safe)
continue;
- /*
- * If we're doing JOIN_UNIQUE_INNER, we can only use the inner's
- * cheapest_total_path, and we have to unique-ify it. (We might
- * be able to relax this to allow other safe, unparameterized
- * inner paths, but right now create_unique_path is not on board
- * with that.)
- */
- if (save_jointype == JOIN_UNIQUE_INNER)
- {
- if (innerpath != innerrel->cheapest_total_path)
- continue;
- innerpath = (Path *) create_unique_path(root, innerrel,
- innerpath,
- extra->sjinfo);
- Assert(innerpath);
- }
-
try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
pathkeys, jointype, extra);
@@ -2227,7 +2131,6 @@ hash_inner_and_outer(PlannerInfo *root,
JoinType jointype,
JoinPathExtraData *extra)
{
- JoinType save_jointype = jointype;
bool isouterjoin = IS_OUTER_JOIN(jointype);
List *hashclauses;
ListCell *l;
@@ -2290,6 +2193,8 @@ hash_inner_and_outer(PlannerInfo *root,
Path *cheapest_startup_outer = outerrel->cheapest_startup_path;
Path *cheapest_total_outer = outerrel->cheapest_total_path;
Path *cheapest_total_inner = innerrel->cheapest_total_path;
+ ListCell *lc1;
+ ListCell *lc2;
/*
* If either cheapest-total path is parameterized by the other rel, we
@@ -2301,114 +2206,64 @@ hash_inner_and_outer(PlannerInfo *root,
PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
return;
- /* Unique-ify if need be; we ignore parameterized possibilities */
- if (jointype == JOIN_UNIQUE_OUTER)
- {
- cheapest_total_outer = (Path *)
- create_unique_path(root, outerrel,
- cheapest_total_outer, extra->sjinfo);
- Assert(cheapest_total_outer);
- jointype = JOIN_INNER;
- try_hashjoin_path(root,
- joinrel,
- cheapest_total_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- /* no possibility of cheap startup here */
- }
- else if (jointype == JOIN_UNIQUE_INNER)
- {
- cheapest_total_inner = (Path *)
- create_unique_path(root, innerrel,
- cheapest_total_inner, extra->sjinfo);
- Assert(cheapest_total_inner);
- jointype = JOIN_INNER;
+ /*
+ * Consider the cheapest startup outer together with the cheapest
+ * total inner, and then consider pairings of cheapest-total paths
+ * including parameterized ones. There is no use in generating
+ * parameterized paths on the basis of possibly cheap startup cost, so
+ * this is sufficient.
+ */
+ if (cheapest_startup_outer != NULL)
try_hashjoin_path(root,
joinrel,
- cheapest_total_outer,
+ cheapest_startup_outer,
cheapest_total_inner,
hashclauses,
jointype,
extra);
- if (cheapest_startup_outer != NULL &&
- cheapest_startup_outer != cheapest_total_outer)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
- }
- else
+
+ foreach(lc1, outerrel->cheapest_parameterized_paths)
{
+ Path *outerpath = (Path *) lfirst(lc1);
+
/*
- * For other jointypes, we consider the cheapest startup outer
- * together with the cheapest total inner, and then consider
- * pairings of cheapest-total paths including parameterized ones.
- * There is no use in generating parameterized paths on the basis
- * of possibly cheap startup cost, so this is sufficient.
+ * We cannot use an outer path that is parameterized by the inner
+ * rel.
*/
- ListCell *lc1;
- ListCell *lc2;
-
- if (cheapest_startup_outer != NULL)
- try_hashjoin_path(root,
- joinrel,
- cheapest_startup_outer,
- cheapest_total_inner,
- hashclauses,
- jointype,
- extra);
+ if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ continue;
- foreach(lc1, outerrel->cheapest_parameterized_paths)
+ foreach(lc2, innerrel->cheapest_parameterized_paths)
{
- Path *outerpath = (Path *) lfirst(lc1);
+ Path *innerpath = (Path *) lfirst(lc2);
/*
- * We cannot use an outer path that is parameterized by the
- * inner rel.
+ * We cannot use an inner path that is parameterized by the
+ * outer rel, either.
*/
- if (PATH_PARAM_BY_REL(outerpath, innerrel))
+ if (PATH_PARAM_BY_REL(innerpath, outerrel))
continue;
- foreach(lc2, innerrel->cheapest_parameterized_paths)
- {
- Path *innerpath = (Path *) lfirst(lc2);
-
- /*
- * We cannot use an inner path that is parameterized by
- * the outer rel, either.
- */
- if (PATH_PARAM_BY_REL(innerpath, outerrel))
- continue;
+ if (outerpath == cheapest_startup_outer &&
+ innerpath == cheapest_total_inner)
+ continue; /* already tried it */
- if (outerpath == cheapest_startup_outer &&
- innerpath == cheapest_total_inner)
- continue; /* already tried it */
-
- try_hashjoin_path(root,
- joinrel,
- outerpath,
- innerpath,
- hashclauses,
- jointype,
- extra);
- }
+ try_hashjoin_path(root,
+ joinrel,
+ outerpath,
+ innerpath,
+ hashclauses,
+ jointype,
+ extra);
}
}
/*
* If the joinrel is parallel-safe, we may be able to consider a
- * partial hash join. However, we can't handle JOIN_UNIQUE_OUTER,
- * because the outer path will be partial, and therefore we won't be
- * able to properly guarantee uniqueness. Also, the resulting path
- * must not be parameterized.
+ * partial hash join. However, the resulting path must not be
+ * parameterized.
*/
if (joinrel->consider_parallel &&
- save_jointype != JOIN_UNIQUE_OUTER &&
outerrel->partial_pathlist != NIL &&
bms_is_empty(joinrel->lateral_relids))
{
@@ -2421,11 +2276,9 @@ hash_inner_and_outer(PlannerInfo *root,
/*
* Can we use a partial inner plan too, so that we can build a
- * shared hash table in parallel? We can't handle
- * JOIN_UNIQUE_INNER because we can't guarantee uniqueness.
+ * shared hash table in parallel?
*/
if (innerrel->partial_pathlist != NIL &&
- save_jointype != JOIN_UNIQUE_INNER &&
enable_parallel_hash)
{
cheapest_partial_inner =
@@ -2441,19 +2294,18 @@ hash_inner_and_outer(PlannerInfo *root,
* Normally, given that the joinrel is parallel-safe, the cheapest
* total inner path will also be parallel-safe, but if not, we'll
* have to search for the cheapest safe, unparameterized inner
- * path. If doing JOIN_UNIQUE_INNER, we can't use any alternative
- * inner path. If full, right, right-semi or right-anti join, we
- * can't use parallelism (building the hash table in each backend)
+ * path. If full, right, right-semi or right-anti join, we can't
+ * use parallelism (building the hash table in each backend)
* because no one process has all the match bits.
*/
- if (save_jointype == JOIN_FULL ||
- save_jointype == JOIN_RIGHT ||
- save_jointype == JOIN_RIGHT_SEMI ||
- save_jointype == JOIN_RIGHT_ANTI)
+ if (jointype == JOIN_FULL ||
+ jointype == JOIN_RIGHT ||
+ jointype == JOIN_RIGHT_SEMI ||
+ jointype == JOIN_RIGHT_ANTI)
cheapest_safe_inner = NULL;
else if (cheapest_total_inner->parallel_safe)
cheapest_safe_inner = cheapest_total_inner;
- else if (save_jointype != JOIN_UNIQUE_INNER)
+ else
cheapest_safe_inner =
get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index aad41b94009..535248aa525 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -19,6 +19,7 @@
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
+#include "optimizer/planner.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
@@ -444,8 +445,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel2, sjinfo) != NULL)
{
/*----------
* For a semijoin, we can join the RHS to anything else by
@@ -477,8 +477,7 @@ join_is_legal(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
}
else if (sjinfo->jointype == JOIN_SEMI &&
bms_equal(sjinfo->syn_righthand, rel1->relids) &&
- create_unique_path(root, rel1, rel1->cheapest_total_path,
- sjinfo) != NULL)
+ create_unique_paths(root, rel1, sjinfo) != NULL)
{
/* Reversed semijoin case */
if (match_sjinfo)
@@ -886,6 +885,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist)
{
+ RelOptInfo *unique_rel2;
+
/*
* Consider paths using each rel as both outer and inner. Depending on
* the join type, a provably empty outer or inner rel might mean the join
@@ -991,14 +992,13 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
/*
* If we know how to unique-ify the RHS and one input rel is
* exactly the RHS (not a superset) we can consider unique-ifying
- * it and then doing a regular join. (The create_unique_path
+ * it and then doing a regular join. (The create_unique_paths
* check here is probably redundant with what join_is_legal did,
* but if so the check is cheap because it's cached. So test
* anyway to be sure.)
*/
if (bms_equal(sjinfo->syn_righthand, rel2->relids) &&
- create_unique_path(root, rel2, rel2->cheapest_total_path,
- sjinfo) != NULL)
+ (unique_rel2 = create_unique_paths(root, rel2, sjinfo)) != NULL)
{
if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||
restriction_is_constant_false(restrictlist, joinrel, false))
@@ -1006,10 +1006,10 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
mark_dummy_rel(joinrel);
break;
}
- add_paths_to_joinrel(root, joinrel, rel1, rel2,
+ add_paths_to_joinrel(root, joinrel, rel1, unique_rel2,
JOIN_UNIQUE_INNER, sjinfo,
restrictlist);
- add_paths_to_joinrel(root, joinrel, rel2, rel1,
+ add_paths_to_joinrel(root, joinrel, unique_rel2, rel1,
JOIN_UNIQUE_OUTER, sjinfo,
restrictlist);
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 9fd5c31edf2..6791cbeb416 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -95,8 +95,6 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
int flags);
static Memoize *create_memoize_plan(PlannerInfo *root, MemoizePath *best_path,
int flags);
-static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
- int flags);
static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
static Plan *create_projection_plan(PlannerInfo *root,
ProjectionPath *best_path,
@@ -106,8 +104,7 @@ static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
static IncrementalSort *create_incrementalsort_plan(PlannerInfo *root,
IncrementalSortPath *best_path, int flags);
static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
-static Unique *create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path,
- int flags);
+static Unique *create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags);
static Agg *create_agg_plan(PlannerInfo *root, AggPath *best_path);
static Plan *create_groupingsets_plan(PlannerInfo *root, GroupingSetsPath *best_path);
static Result *create_minmaxagg_plan(PlannerInfo *root, MinMaxAggPath *best_path);
@@ -296,9 +293,9 @@ static WindowAgg *make_windowagg(List *tlist, WindowClause *wc,
static Group *make_group(List *tlist, List *qual, int numGroupCols,
AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations,
Plan *lefttree);
-static Unique *make_unique_from_sortclauses(Plan *lefttree, List *distinctList);
static Unique *make_unique_from_pathkeys(Plan *lefttree,
- List *pathkeys, int numCols);
+ List *pathkeys, int numCols,
+ Relids relids);
static Gather *make_gather(List *qptlist, List *qpqual,
int nworkers, int rescan_param, bool single_copy, Plan *subplan);
static SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy,
@@ -470,19 +467,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
flags);
break;
case T_Unique:
- if (IsA(best_path, UpperUniquePath))
- {
- plan = (Plan *) create_upper_unique_plan(root,
- (UpperUniquePath *) best_path,
- flags);
- }
- else
- {
- Assert(IsA(best_path, UniquePath));
- plan = create_unique_plan(root,
- (UniquePath *) best_path,
- flags);
- }
+ plan = (Plan *) create_unique_plan(root,
+ (UniquePath *) best_path,
+ flags);
break;
case T_Gather:
plan = (Plan *) create_gather_plan(root,
@@ -1764,207 +1751,6 @@ create_memoize_plan(PlannerInfo *root, MemoizePath *best_path, int flags)
return plan;
}
-/*
- * create_unique_plan
- * Create a Unique plan for 'best_path' and (recursively) plans
- * for its subpaths.
- *
- * Returns a Plan node.
- */
-static Plan *
-create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
-{
- Plan *plan;
- Plan *subplan;
- List *in_operators;
- List *uniq_exprs;
- List *newtlist;
- int nextresno;
- bool newitems;
- int numGroupCols;
- AttrNumber *groupColIdx;
- Oid *groupCollations;
- int groupColPos;
- ListCell *l;
-
- /* Unique doesn't project, so tlist requirements pass through */
- subplan = create_plan_recurse(root, best_path->subpath, flags);
-
- /* Done if we don't need to do any actual unique-ifying */
- if (best_path->umethod == UNIQUE_PATH_NOOP)
- return subplan;
-
- /*
- * As constructed, the subplan has a "flat" tlist containing just the Vars
- * needed here and at upper levels. The values we are supposed to
- * unique-ify may be expressions in these variables. We have to add any
- * such expressions to the subplan's tlist.
- *
- * The subplan may have a "physical" tlist if it is a simple scan plan. If
- * we're going to sort, this should be reduced to the regular tlist, so
- * that we don't sort more data than we need to. For hashing, the tlist
- * should be left as-is if we don't need to add any expressions; but if we
- * do have to add expressions, then a projection step will be needed at
- * runtime anyway, so we may as well remove unneeded items. Therefore
- * newtlist starts from build_path_tlist() not just a copy of the
- * subplan's tlist; and we don't install it into the subplan unless we are
- * sorting or stuff has to be added.
- */
- in_operators = best_path->in_operators;
- uniq_exprs = best_path->uniq_exprs;
-
- /* initialize modified subplan tlist as just the "required" vars */
- newtlist = build_path_tlist(root, &best_path->path);
- nextresno = list_length(newtlist) + 1;
- newitems = false;
-
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle)
- {
- tle = makeTargetEntry((Expr *) uniqexpr,
- nextresno,
- NULL,
- false);
- newtlist = lappend(newtlist, tle);
- nextresno++;
- newitems = true;
- }
- }
-
- /* Use change_plan_targetlist in case we need to insert a Result node */
- if (newitems || best_path->umethod == UNIQUE_PATH_SORT)
- subplan = change_plan_targetlist(subplan, newtlist,
- best_path->path.parallel_safe);
-
- /*
- * Build control information showing which subplan output columns are to
- * be examined by the grouping step. Unfortunately we can't merge this
- * with the previous loop, since we didn't then know which version of the
- * subplan tlist we'd end up using.
- */
- newtlist = subplan->targetlist;
- numGroupCols = list_length(uniq_exprs);
- groupColIdx = (AttrNumber *) palloc(numGroupCols * sizeof(AttrNumber));
- groupCollations = (Oid *) palloc(numGroupCols * sizeof(Oid));
-
- groupColPos = 0;
- foreach(l, uniq_exprs)
- {
- Expr *uniqexpr = lfirst(l);
- TargetEntry *tle;
-
- tle = tlist_member(uniqexpr, newtlist);
- if (!tle) /* shouldn't happen */
- elog(ERROR, "failed to find unique expression in subplan tlist");
- groupColIdx[groupColPos] = tle->resno;
- groupCollations[groupColPos] = exprCollation((Node *) tle->expr);
- groupColPos++;
- }
-
- if (best_path->umethod == UNIQUE_PATH_HASH)
- {
- Oid *groupOperators;
-
- /*
- * Get the hashable equality operators for the Agg node to use.
- * Normally these are the same as the IN clause operators, but if
- * those are cross-type operators then the equality operators are the
- * ones for the IN clause operators' RHS datatype.
- */
- groupOperators = (Oid *) palloc(numGroupCols * sizeof(Oid));
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid eq_oper;
-
- if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
- elog(ERROR, "could not find compatible hash operator for operator %u",
- in_oper);
- groupOperators[groupColPos++] = eq_oper;
- }
-
- /*
- * Since the Agg node is going to project anyway, we can give it the
- * minimum output tlist, without any stuff we might have added to the
- * subplan tlist.
- */
- plan = (Plan *) make_agg(build_path_tlist(root, &best_path->path),
- NIL,
- AGG_HASHED,
- AGGSPLIT_SIMPLE,
- numGroupCols,
- groupColIdx,
- groupOperators,
- groupCollations,
- NIL,
- NIL,
- best_path->path.rows,
- 0,
- subplan);
- }
- else
- {
- List *sortList = NIL;
- Sort *sort;
-
- /* Create an ORDER BY list to sort the input compatibly */
- groupColPos = 0;
- foreach(l, in_operators)
- {
- Oid in_oper = lfirst_oid(l);
- Oid sortop;
- Oid eqop;
- TargetEntry *tle;
- SortGroupClause *sortcl;
-
- sortop = get_ordering_op_for_equality_op(in_oper, false);
- if (!OidIsValid(sortop)) /* shouldn't happen */
- elog(ERROR, "could not find ordering operator for equality operator %u",
- in_oper);
-
- /*
- * The Unique node will need equality operators. Normally these
- * are the same as the IN clause operators, but if those are
- * cross-type operators then the equality operators are the ones
- * for the IN clause operators' RHS datatype.
- */
- eqop = get_equality_op_for_ordering_op(sortop, NULL);
- if (!OidIsValid(eqop)) /* shouldn't happen */
- elog(ERROR, "could not find equality operator for ordering operator %u",
- sortop);
-
- tle = get_tle_by_resno(subplan->targetlist,
- groupColIdx[groupColPos]);
- Assert(tle != NULL);
-
- sortcl = makeNode(SortGroupClause);
- sortcl->tleSortGroupRef = assignSortGroupRef(tle,
- subplan->targetlist);
- sortcl->eqop = eqop;
- sortcl->sortop = sortop;
- sortcl->reverse_sort = false;
- sortcl->nulls_first = false;
- sortcl->hashable = false; /* no need to make this accurate */
- sortList = lappend(sortList, sortcl);
- groupColPos++;
- }
- sort = make_sort_from_sortclauses(sortList, subplan);
- label_sort_with_costsize(root, sort, -1.0);
- plan = (Plan *) make_unique_from_sortclauses((Plan *) sort, sortList);
- }
-
- /* Copy cost data from Path to Plan */
- copy_generic_path_info(plan, &best_path->path);
-
- return plan;
-}
-
/*
* create_gather_plan
*
@@ -2322,13 +2108,13 @@ create_group_plan(PlannerInfo *root, GroupPath *best_path)
}
/*
- * create_upper_unique_plan
+ * create_unique_plan
*
* Create a Unique plan for 'best_path' and (recursively) plans
* for its subpaths.
*/
static Unique *
-create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flags)
+create_unique_plan(PlannerInfo *root, UniquePath *best_path, int flags)
{
Unique *plan;
Plan *subplan;
@@ -2340,9 +2126,17 @@ create_upper_unique_plan(PlannerInfo *root, UpperUniquePath *best_path, int flag
subplan = create_plan_recurse(root, best_path->subpath,
flags | CP_LABEL_TLIST);
+ /*
+ * make_unique_from_pathkeys calls find_ec_member_matching_expr, which
+ * will ignore any child EC members that don't belong to the given relids.
+ * Thus, if this unique path is based on a child relation, we must pass
+ * its relids.
+ */
plan = make_unique_from_pathkeys(subplan,
best_path->path.pathkeys,
- best_path->numkeys);
+ best_path->numkeys,
+ IS_OTHER_REL(best_path->path.parent) ?
+ best_path->path.parent->relids : NULL);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -6880,61 +6674,14 @@ make_group(List *tlist,
}
/*
- * distinctList is a list of SortGroupClauses, identifying the targetlist items
- * that should be considered by the Unique filter. The input path must
- * already be sorted accordingly.
- */
-static Unique *
-make_unique_from_sortclauses(Plan *lefttree, List *distinctList)
-{
- Unique *node = makeNode(Unique);
- Plan *plan = &node->plan;
- int numCols = list_length(distinctList);
- int keyno = 0;
- AttrNumber *uniqColIdx;
- Oid *uniqOperators;
- Oid *uniqCollations;
- ListCell *slitem;
-
- plan->targetlist = lefttree->targetlist;
- plan->qual = NIL;
- plan->lefttree = lefttree;
- plan->righttree = NULL;
-
- /*
- * convert SortGroupClause list into arrays of attr indexes and equality
- * operators, as wanted by executor
- */
- Assert(numCols > 0);
- uniqColIdx = (AttrNumber *) palloc(sizeof(AttrNumber) * numCols);
- uniqOperators = (Oid *) palloc(sizeof(Oid) * numCols);
- uniqCollations = (Oid *) palloc(sizeof(Oid) * numCols);
-
- foreach(slitem, distinctList)
- {
- SortGroupClause *sortcl = (SortGroupClause *) lfirst(slitem);
- TargetEntry *tle = get_sortgroupclause_tle(sortcl, plan->targetlist);
-
- uniqColIdx[keyno] = tle->resno;
- uniqOperators[keyno] = sortcl->eqop;
- uniqCollations[keyno] = exprCollation((Node *) tle->expr);
- Assert(OidIsValid(uniqOperators[keyno]));
- keyno++;
- }
-
- node->numCols = numCols;
- node->uniqColIdx = uniqColIdx;
- node->uniqOperators = uniqOperators;
- node->uniqCollations = uniqCollations;
-
- return node;
-}
-
-/*
- * as above, but use pathkeys to identify the sort columns and semantics
+ * pathkeys is a list of PathKeys, identifying the sort columns and semantics.
+ * The input plan must already be sorted accordingly.
+ *
+ * relids identifies the child relation being unique-ified, if any.
*/
static Unique *
-make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
+make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols,
+ Relids relids)
{
Unique *node = makeNode(Unique);
Plan *plan = &node->plan;
@@ -6997,7 +6744,7 @@ make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
foreach(j, plan->targetlist)
{
tle = (TargetEntry *) lfirst(j);
- em = find_ec_member_matching_expr(ec, tle->expr, NULL);
+ em = find_ec_member_matching_expr(ec, tle->expr, relids);
if (em)
{
/* found expr already in tlist */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0d5a692e5fd..e6fbabcb28f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -268,6 +268,12 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
static int common_prefix_cmp(const void *a, const void *b);
static List *generate_setop_child_grouplist(SetOperationStmt *op,
List *targetlist);
+static void create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
+static void create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel);
/*****************************************************************************
@@ -4939,10 +4945,10 @@ create_partial_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_partial_path(partial_distinct_rel, (Path *)
- create_upper_unique_path(root, partial_distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, partial_distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -5133,10 +5139,10 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
else
{
add_path(distinct_rel, (Path *)
- create_upper_unique_path(root, distinct_rel,
- sorted_path,
- list_length(root->distinct_pathkeys),
- numDistinctRows));
+ create_unique_path(root, distinct_rel,
+ sorted_path,
+ list_length(root->distinct_pathkeys),
+ numDistinctRows));
}
}
}
@@ -8270,3 +8276,499 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist)
return grouplist;
}
+
+/*
+ * create_unique_paths
+ * Build a new RelOptInfo containing Paths that represent elimination of
+ * distinct rows from the input data. Distinct-ness is defined according to
+ * the needs of the semijoin represented by sjinfo. If it is not possible
+ * to identify how to make the data unique, NULL is returned.
+ *
+ * If used at all, this is likely to be called repeatedly on the same rel;
+ * So we cache the result.
+ */
+RelOptInfo *
+create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *unique_rel;
+ List *sortPathkeys = NIL;
+ List *groupClause = NIL;
+ MemoryContext oldcontext;
+
+ /* Caller made a mistake if SpecialJoinInfo is the wrong one */
+ Assert(sjinfo->jointype == JOIN_SEMI);
+ Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
+
+ /* If result already cached, return it */
+ if (rel->unique_rel)
+ return rel->unique_rel;
+
+ /* If it's not possible to unique-ify, return NULL */
+ if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
+ return NULL;
+
+ /*
+ * When called during GEQO join planning, we are in a short-lived memory
+ * context. We must make sure that the unique rel and any subsidiary data
+ * structures created for a baserel survive the GEQO cycle, else the
+ * baserel is trashed for future GEQO cycles. On the other hand, when we
+ * are creating those for a joinrel during GEQO, we don't want them to
+ * clutter the main planning context. Upshot is that the best solution is
+ * to explicitly allocate memory in the same context the given RelOptInfo
+ * is in.
+ */
+ oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
+
+ unique_rel = makeNode(RelOptInfo);
+ memcpy(unique_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ unique_rel->pathlist = NIL;
+ unique_rel->ppilist = NIL;
+ unique_rel->partial_pathlist = NIL;
+ unique_rel->cheapest_startup_path = NULL;
+ unique_rel->cheapest_total_path = NULL;
+ unique_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * Build the target list for the unique rel. We also build the pathkeys
+ * that represent the ordering requirements for the sort-based
+ * implementation, and the list of SortGroupClause nodes that represent
+ * the columns to be grouped on for the hash-based implementation.
+ *
+ * For a child rel, we can construct these fields from those of its
+ * parent.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ PathTarget *child_unique_target;
+ PathTarget *parent_unique_target;
+
+ parent_unique_target = rel->top_parent->unique_rel->reltarget;
+
+ child_unique_target = copy_pathtarget(parent_unique_target);
+
+ /* Translate the target expressions */
+ child_unique_target->exprs = (List *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) parent_unique_target->exprs,
+ rel,
+ rel->top_parent);
+
+ unique_rel->reltarget = child_unique_target;
+
+ sortPathkeys = rel->top_parent->unique_pathkeys;
+ groupClause = rel->top_parent->unique_groupclause;
+ }
+ else
+ {
+ List *newtlist;
+ int nextresno;
+ List *sortList = NIL;
+ ListCell *lc1;
+ ListCell *lc2;
+
+ /*
+ * The values we are supposed to unique-ify may be expressions in the
+ * variables of the input rel's targetlist. We have to add any such
+ * expressions to the unique rel's targetlist.
+ *
+ * While in the loop, build the lists of SortGroupClause's that
+ * represent the ordering for the sort-based implementation and the
+ * grouping for the hash-based implementation.
+ */
+ newtlist = make_tlist_from_pathtarget(rel->reltarget);
+ nextresno = list_length(newtlist) + 1;
+
+ forboth(lc1, sjinfo->semi_rhs_exprs, lc2, sjinfo->semi_operators)
+ {
+ Expr *uniqexpr = lfirst(lc1);
+ Oid in_oper = lfirst_oid(lc2);
+ Oid sortop = InvalidOid;
+ TargetEntry *tle;
+
+ tle = tlist_member(uniqexpr, newtlist);
+ if (!tle)
+ {
+ tle = makeTargetEntry((Expr *) uniqexpr,
+ nextresno,
+ NULL,
+ false);
+ newtlist = lappend(newtlist, tle);
+ nextresno++;
+ }
+
+ if (sjinfo->semi_can_btree)
+ {
+ /* Create an ORDER BY list to sort the input compatibly */
+ Oid eqop;
+ SortGroupClause *sortcl;
+
+ sortop = get_ordering_op_for_equality_op(in_oper, false);
+ if (!OidIsValid(sortop)) /* shouldn't happen */
+ elog(ERROR, "could not find ordering operator for equality operator %u",
+ in_oper);
+
+ /*
+ * The Unique node will need equality operators. Normally
+ * these are the same as the IN clause operators, but if those
+ * are cross-type operators then the equality operators are
+ * the ones for the IN clause operators' RHS datatype.
+ */
+ eqop = get_equality_op_for_ordering_op(sortop, NULL);
+ if (!OidIsValid(eqop)) /* shouldn't happen */
+ elog(ERROR, "could not find equality operator for ordering operator %u",
+ sortop);
+
+ sortcl = makeNode(SortGroupClause);
+ sortcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ sortcl->eqop = eqop;
+ sortcl->sortop = sortop;
+ sortcl->reverse_sort = false;
+ sortcl->nulls_first = false;
+ sortcl->hashable = false; /* no need to make this accurate */
+ sortList = lappend(sortList, sortcl);
+ }
+ if (sjinfo->semi_can_hash)
+ {
+ /* Create a GROUP BY list for the Agg node to use */
+ Oid eq_oper;
+ SortGroupClause *groupcl;
+
+ /*
+ * Get the hashable equality operators for the Agg node to
+ * use. Normally these are the same as the IN clause
+ * operators, but if those are cross-type operators then the
+ * equality operators are the ones for the IN clause
+ * operators' RHS datatype.
+ */
+ if (!get_compatible_hash_operators(in_oper, NULL, &eq_oper))
+ elog(ERROR, "could not find compatible hash operator for operator %u",
+ in_oper);
+
+ groupcl = makeNode(SortGroupClause);
+ groupcl->tleSortGroupRef = assignSortGroupRef(tle, newtlist);
+ groupcl->eqop = eq_oper;
+ groupcl->sortop = sortop;
+ groupcl->reverse_sort = false;
+ groupcl->nulls_first = false;
+ groupcl->hashable = true;
+ groupClause = lappend(groupClause, groupcl);
+ }
+ }
+
+ unique_rel->reltarget = create_pathtarget(root, newtlist);
+ sortPathkeys = make_pathkeys_for_sortclauses(root, sortList, newtlist);
+ }
+
+ /* build unique paths based on input rel's pathlist */
+ create_final_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* build unique paths based on input rel's partial_pathlist */
+ create_partial_unique_paths(root, rel, sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+
+ /* Now choose the best path(s) */
+ set_cheapest(unique_rel);
+
+ /*
+ * There shouldn't be any partial paths for the unique relation;
+ * otherwise, we won't be able to properly guarantee uniqueness.
+ */
+ Assert(unique_rel->partial_pathlist == NIL);
+
+ /* Cache the result */
+ rel->unique_rel = unique_rel;
+ rel->unique_pathkeys = sortPathkeys;
+ rel->unique_groupclause = groupClause;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return unique_rel;
+}
+
+/*
+ * create_final_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' pathlist
+ */
+static void
+create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ Path *cheapest_input_path = input_rel->cheapest_total_path;
+
+ /* Estimate number of output rows */
+ unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ cheapest_input_path->rows,
+ NULL,
+ NULL);
+
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, input_rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_input_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_input_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ input_path,
+ unique_rel->reltarget);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, unique_rel, path,
+ list_length(sortPathkeys),
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ unique_rel,
+ cheapest_input_path,
+ unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ unique_rel,
+ path,
+ cheapest_input_path->pathtarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ unique_rel->rows);
+
+ add_path(unique_rel, path);
+
+ }
+}
+
+/*
+ * create_partial_unique_paths
+ * Create unique paths in 'unique_rel' based on 'input_rel' partial_pathlist
+ */
+static void
+create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
+ List *sortPathkeys, List *groupClause,
+ SpecialJoinInfo *sjinfo, RelOptInfo *unique_rel)
+{
+ RelOptInfo *partial_unique_rel;
+ Path *cheapest_partial_path;
+
+ /* nothing to do when there are no partial paths in the input rel */
+ if (!input_rel->consider_parallel || input_rel->partial_pathlist == NIL)
+ return;
+
+ /*
+ * nothing to do if there's anything in the targetlist that's
+ * parallel-restricted.
+ */
+ if (!is_parallel_safe(root, (Node *) unique_rel->reltarget->exprs))
+ return;
+
+ cheapest_partial_path = linitial(input_rel->partial_pathlist);
+
+ partial_unique_rel = makeNode(RelOptInfo);
+ memcpy(partial_unique_rel, input_rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ partial_unique_rel->pathlist = NIL;
+ partial_unique_rel->ppilist = NIL;
+ partial_unique_rel->partial_pathlist = NIL;
+ partial_unique_rel->cheapest_startup_path = NULL;
+ partial_unique_rel->cheapest_total_path = NULL;
+ partial_unique_rel->cheapest_parameterized_paths = NIL;
+
+ /* Estimate number of output rows */
+ partial_unique_rel->rows = estimate_num_groups(root,
+ sjinfo->semi_rhs_exprs,
+ cheapest_partial_path->rows,
+ NULL,
+ NULL);
+ partial_unique_rel->reltarget = unique_rel->reltarget;
+
+ /* Consider sort-based implementations, if possible. */
+ if (sjinfo->semi_can_btree)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest partial path and incremental sort on any paths
+ * with presorted keys.
+ */
+ foreach(lc, input_rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(sortPathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ input_path,
+ partial_unique_rel->reltarget);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ partial_unique_rel,
+ path,
+ sortPathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ path = (Path *) create_unique_path(root, partial_unique_rel, path,
+ list_length(sortPathkeys),
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+ }
+
+ /* Consider hash-based implementation, if possible. */
+ if (sjinfo->semi_can_hash)
+ {
+ Path *path;
+
+ /*
+ * Make a separate ProjectionPath in case we need a Result node.
+ */
+ path = (Path *) create_projection_path(root,
+ partial_unique_rel,
+ cheapest_partial_path,
+ partial_unique_rel->reltarget);
+
+ path = (Path *) create_agg_path(root,
+ partial_unique_rel,
+ path,
+ cheapest_partial_path->pathtarget,
+ AGG_HASHED,
+ AGGSPLIT_SIMPLE,
+ groupClause,
+ NIL,
+ NULL,
+ partial_unique_rel->rows);
+
+ add_partial_path(partial_unique_rel, path);
+ }
+
+ if (partial_unique_rel->partial_pathlist != NIL)
+ {
+ generate_useful_gather_paths(root, partial_unique_rel, true);
+ set_cheapest(partial_unique_rel);
+
+ /*
+ * Finally, create paths to unique-ify the final result. This step is
+ * needed to remove any duplicates due to combining rows from parallel
+ * workers.
+ */
+ create_final_unique_paths(root, partial_unique_rel,
+ sortPathkeys, groupClause,
+ sjinfo, unique_rel);
+ }
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index eab44da65b8..28a4ae64440 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -929,11 +929,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
@@ -946,11 +946,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
make_pathkeys_for_sortclauses(root, groupList, tlist),
-1.0);
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(path->pathkeys),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(path->pathkeys),
+ dNumGroups);
add_path(result_rel, path);
}
}
@@ -970,11 +970,11 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
NULL);
/* and make the MergeAppend unique */
- path = (Path *) create_upper_unique_path(root,
- result_rel,
- path,
- list_length(tlist),
- dNumGroups);
+ path = (Path *) create_unique_path(root,
+ result_rel,
+ path,
+ list_length(tlist),
+ dNumGroups);
add_path(result_rel, path);
}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a4c5867cdcb..b0da28150d3 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -46,7 +46,6 @@ typedef enum
*/
#define STD_FUZZ_FACTOR 1.01
-static List *translate_sub_tlist(List *tlist, int relid);
static int append_total_cost_compare(const ListCell *a, const ListCell *b);
static int append_startup_cost_compare(const ListCell *a, const ListCell *b);
static List *reparameterize_pathlist_by_child(PlannerInfo *root,
@@ -381,7 +380,6 @@ set_cheapest(RelOptInfo *parent_rel)
parent_rel->cheapest_startup_path = cheapest_startup_path;
parent_rel->cheapest_total_path = cheapest_total_path;
- parent_rel->cheapest_unique_path = NULL; /* computed only if needed */
parent_rel->cheapest_parameterized_paths = parameterized_paths;
}
@@ -1740,246 +1738,6 @@ create_memoize_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * create_unique_path
- * Creates a path representing elimination of distinct rows from the
- * input data. Distinct-ness is defined according to the needs of the
- * semijoin represented by sjinfo. If it is not possible to identify
- * how to make the data unique, NULL is returned.
- *
- * If used at all, this is likely to be called repeatedly on the same rel;
- * and the input subpath should always be the same (the cheapest_total path
- * for the rel). So we cache the result.
- */
-UniquePath *
-create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
- SpecialJoinInfo *sjinfo)
-{
- UniquePath *pathnode;
- Path sort_path; /* dummy for result of cost_sort */
- Path agg_path; /* dummy for result of cost_agg */
- MemoryContext oldcontext;
- int numCols;
-
- /* Caller made a mistake if subpath isn't cheapest_total ... */
- Assert(subpath == rel->cheapest_total_path);
- Assert(subpath->parent == rel);
- /* ... or if SpecialJoinInfo is the wrong one */
- Assert(sjinfo->jointype == JOIN_SEMI);
- Assert(bms_equal(rel->relids, sjinfo->syn_righthand));
-
- /* If result already cached, return it */
- if (rel->cheapest_unique_path)
- return (UniquePath *) rel->cheapest_unique_path;
-
- /* If it's not possible to unique-ify, return NULL */
- if (!(sjinfo->semi_can_btree || sjinfo->semi_can_hash))
- return NULL;
-
- /*
- * When called during GEQO join planning, we are in a short-lived memory
- * context. We must make sure that the path and any subsidiary data
- * structures created for a baserel survive the GEQO cycle, else the
- * baserel is trashed for future GEQO cycles. On the other hand, when we
- * are creating those for a joinrel during GEQO, we don't want them to
- * clutter the main planning context. Upshot is that the best solution is
- * to explicitly allocate memory in the same context the given RelOptInfo
- * is in.
- */
- oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
-
- pathnode = makeNode(UniquePath);
-
- pathnode->path.pathtype = T_Unique;
- pathnode->path.parent = rel;
- pathnode->path.pathtarget = rel->reltarget;
- pathnode->path.param_info = subpath->param_info;
- pathnode->path.parallel_aware = false;
- pathnode->path.parallel_safe = rel->consider_parallel &&
- subpath->parallel_safe;
- pathnode->path.parallel_workers = subpath->parallel_workers;
-
- /*
- * Assume the output is unsorted, since we don't necessarily have pathkeys
- * to represent it. (This might get overridden below.)
- */
- pathnode->path.pathkeys = NIL;
-
- pathnode->subpath = subpath;
-
- /*
- * Under GEQO and when planning child joins, the sjinfo might be
- * short-lived, so we'd better make copies of data structures we extract
- * from it.
- */
- pathnode->in_operators = copyObject(sjinfo->semi_operators);
- pathnode->uniq_exprs = copyObject(sjinfo->semi_rhs_exprs);
-
- /*
- * If the input is a relation and it has a unique index that proves the
- * semi_rhs_exprs are unique, then we don't need to do anything. Note
- * that relation_has_unique_index_for automatically considers restriction
- * clauses for the rel, as well.
- */
- if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
- relation_has_unique_index_for(root, rel, NIL,
- sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
-
- /*
- * If the input is a subquery whose output must be unique already, then we
- * don't need to do anything. The test for uniqueness has to consider
- * exactly which columns we are extracting; for example "SELECT DISTINCT
- * x,y" doesn't guarantee that x alone is distinct. So we cannot check for
- * this optimization unless semi_rhs_exprs consists only of simple Vars
- * referencing subquery outputs. (Possibly we could do something with
- * expressions in the subquery outputs, too, but for now keep it simple.)
- */
- if (rel->rtekind == RTE_SUBQUERY)
- {
- RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
-
- if (query_supports_distinctness(rte->subquery))
- {
- List *sub_tlist_colnos;
-
- sub_tlist_colnos = translate_sub_tlist(sjinfo->semi_rhs_exprs,
- rel->relid);
-
- if (sub_tlist_colnos &&
- query_is_distinct_for(rte->subquery,
- sub_tlist_colnos,
- sjinfo->semi_operators))
- {
- pathnode->umethod = UNIQUE_PATH_NOOP;
- pathnode->path.rows = rel->rows;
- pathnode->path.disabled_nodes = subpath->disabled_nodes;
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost = subpath->total_cost;
- pathnode->path.pathkeys = subpath->pathkeys;
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
- }
- }
- }
-
- /* Estimate number of output rows */
- pathnode->path.rows = estimate_num_groups(root,
- sjinfo->semi_rhs_exprs,
- rel->rows,
- NULL,
- NULL);
- numCols = list_length(sjinfo->semi_rhs_exprs);
-
- if (sjinfo->semi_can_btree)
- {
- /*
- * Estimate cost for sort+unique implementation
- */
- cost_sort(&sort_path, root, NIL,
- subpath->disabled_nodes,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width,
- 0.0,
- work_mem,
- -1.0);
-
- /*
- * Charge one cpu_operator_cost per comparison per input tuple. We
- * assume all columns get compared at most of the tuples. (XXX
- * probably this is an overestimate.) This should agree with
- * create_upper_unique_path.
- */
- sort_path.total_cost += cpu_operator_cost * rel->rows * numCols;
- }
-
- if (sjinfo->semi_can_hash)
- {
- /*
- * Estimate the overhead per hashtable entry at 64 bytes (same as in
- * planner.c).
- */
- int hashentrysize = subpath->pathtarget->width + 64;
-
- if (hashentrysize * pathnode->path.rows > get_hash_memory_limit())
- {
- /*
- * We should not try to hash. Hack the SpecialJoinInfo to
- * remember this, in case we come through here again.
- */
- sjinfo->semi_can_hash = false;
- }
- else
- cost_agg(&agg_path, root,
- AGG_HASHED, NULL,
- numCols, pathnode->path.rows,
- NIL,
- subpath->disabled_nodes,
- subpath->startup_cost,
- subpath->total_cost,
- rel->rows,
- subpath->pathtarget->width);
- }
-
- if (sjinfo->semi_can_btree && sjinfo->semi_can_hash)
- {
- if (agg_path.disabled_nodes < sort_path.disabled_nodes ||
- (agg_path.disabled_nodes == sort_path.disabled_nodes &&
- agg_path.total_cost < sort_path.total_cost))
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- pathnode->umethod = UNIQUE_PATH_SORT;
- }
- else if (sjinfo->semi_can_btree)
- pathnode->umethod = UNIQUE_PATH_SORT;
- else if (sjinfo->semi_can_hash)
- pathnode->umethod = UNIQUE_PATH_HASH;
- else
- {
- /* we can get here only if we abandoned hashing above */
- MemoryContextSwitchTo(oldcontext);
- return NULL;
- }
-
- if (pathnode->umethod == UNIQUE_PATH_HASH)
- {
- pathnode->path.disabled_nodes = agg_path.disabled_nodes;
- pathnode->path.startup_cost = agg_path.startup_cost;
- pathnode->path.total_cost = agg_path.total_cost;
- }
- else
- {
- pathnode->path.disabled_nodes = sort_path.disabled_nodes;
- pathnode->path.startup_cost = sort_path.startup_cost;
- pathnode->path.total_cost = sort_path.total_cost;
- }
-
- rel->cheapest_unique_path = (Path *) pathnode;
-
- MemoryContextSwitchTo(oldcontext);
-
- return pathnode;
-}
-
/*
* create_gather_merge_path
*
@@ -2031,36 +1789,6 @@ create_gather_merge_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
return pathnode;
}
-/*
- * translate_sub_tlist - get subquery column numbers represented by tlist
- *
- * The given targetlist usually contains only Vars referencing the given relid.
- * Extract their varattnos (ie, the column numbers of the subquery) and return
- * as an integer List.
- *
- * If any of the tlist items is not a simple Var, we cannot determine whether
- * the subquery's uniqueness condition (if any) matches ours, so punt and
- * return NIL.
- */
-static List *
-translate_sub_tlist(List *tlist, int relid)
-{
- List *result = NIL;
- ListCell *l;
-
- foreach(l, tlist)
- {
- Var *var = (Var *) lfirst(l);
-
- if (!var || !IsA(var, Var) ||
- var->varno != relid)
- return NIL; /* punt */
-
- result = lappend_int(result, var->varattno);
- }
- return result;
-}
-
/*
* create_gather_path
* Creates a path corresponding to a gather scan, returning the
@@ -2818,8 +2546,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3074,8 +2801,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3122,8 +2848,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3199,13 +2924,10 @@ create_group_path(PlannerInfo *root,
}
/*
- * create_upper_unique_path
+ * create_unique_path
* Creates a pathnode that represents performing an explicit Unique step
* on presorted input.
*
- * This produces a Unique plan node, but the use-case is so different from
- * create_unique_path that it doesn't seem worth trying to merge the two.
- *
* 'rel' is the parent relation associated with the result
* 'subpath' is the path representing the source of data
* 'numCols' is the number of grouping columns
@@ -3214,21 +2936,20 @@ create_group_path(PlannerInfo *root,
* The input path must be sorted on the grouping columns, plus possibly
* additional columns; so the first numCols pathkeys are the grouping columns
*/
-UpperUniquePath *
-create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups)
+UniquePath *
+create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups)
{
- UpperUniquePath *pathnode = makeNode(UpperUniquePath);
+ UniquePath *pathnode = makeNode(UniquePath);
pathnode->path.pathtype = T_Unique;
pathnode->path.parent = rel;
/* Unique doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3284,8 +3005,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..0e523d2eb5b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -217,7 +217,6 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->partial_pathlist = NIL;
rel->cheapest_startup_path = NULL;
rel->cheapest_total_path = NULL;
- rel->cheapest_unique_path = NULL;
rel->cheapest_parameterized_paths = NIL;
rel->relid = relid;
rel->rtekind = rte->rtekind;
@@ -269,6 +268,9 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->fdw_private = NULL;
rel->unique_for_rels = NIL;
rel->non_unique_for_rels = NIL;
+ rel->unique_rel = NULL;
+ rel->unique_pathkeys = NIL;
+ rel->unique_groupclause = NIL;
rel->baserestrictinfo = NIL;
rel->baserestrictcost.startup = 0;
rel->baserestrictcost.per_tuple = 0;
@@ -713,7 +715,6 @@ build_join_rel(PlannerInfo *root,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
/* init direct_lateral_relids from children; we'll finish it up below */
joinrel->direct_lateral_relids =
@@ -748,6 +749,9 @@ build_join_rel(PlannerInfo *root,
joinrel->fdw_private = NULL;
joinrel->unique_for_rels = NIL;
joinrel->non_unique_for_rels = NIL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -906,7 +910,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->partial_pathlist = NIL;
joinrel->cheapest_startup_path = NULL;
joinrel->cheapest_total_path = NULL;
- joinrel->cheapest_unique_path = NULL;
joinrel->cheapest_parameterized_paths = NIL;
joinrel->direct_lateral_relids = NULL;
joinrel->lateral_relids = NULL;
@@ -933,6 +936,9 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->useridiscurrent = false;
joinrel->fdwroutine = NULL;
joinrel->fdw_private = NULL;
+ joinrel->unique_rel = NULL;
+ joinrel->unique_pathkeys = NIL;
+ joinrel->unique_groupclause = NIL;
joinrel->baserestrictinfo = NIL;
joinrel->baserestrictcost.startup = 0;
joinrel->baserestrictcost.per_tuple = 0;
@@ -1488,7 +1494,6 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->pathlist = NIL;
upperrel->cheapest_startup_path = NULL;
upperrel->cheapest_total_path = NULL;
- upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index b2dc380b57b..fb3957e75e5 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -323,8 +323,8 @@ typedef enum JoinType
* These codes are used internally in the planner, but are not supported
* by the executor (nor, indeed, by most of the planner).
*/
- JOIN_UNIQUE_OUTER, /* LHS path must be made unique */
- JOIN_UNIQUE_INNER, /* RHS path must be made unique */
+ JOIN_UNIQUE_OUTER, /* LHS has be made unique */
+ JOIN_UNIQUE_INNER, /* RHS has be made unique */
/*
* We might need additional join types someday.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ad2726f026f..4a903d1ec18 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -703,8 +703,6 @@ typedef struct PartitionSchemeData *PartitionScheme;
* (regardless of ordering) among the unparameterized paths;
* or if there is no unparameterized path, the path with lowest
* total cost among the paths with minimum parameterization
- * cheapest_unique_path - for caching cheapest path to produce unique
- * (no duplicates) output from relation; NULL if not yet requested
* cheapest_parameterized_paths - best paths for their parameterizations;
* always includes cheapest_total_path, even if that's unparameterized
* direct_lateral_relids - rels this rel has direct LATERAL references to
@@ -770,6 +768,21 @@ typedef struct PartitionSchemeData *PartitionScheme;
* other rels for which we have tried and failed to prove
* this one unique
*
+ * Three fields are used to cache information about unique-ification of this
+ * relation. This is used to support semijoins where the relation appears on
+ * the RHS: the relation is first unique-ified, and then a regular join is
+ * performed:
+ *
+ * unique_rel - the unique-ified version of the relation, containing paths
+ * that produce unique (no duplicates) output from relation;
+ * NULL if not yet requested
+ * unique_pathkeys - pathkeys that represent the ordering requirements for
+ * the relation's output in sort-based unique-ification
+ * implementations
+ * unique_groupclause - a list of SortGroupClause nodes that represent the
+ * columns to be grouped on in hash-based unique-ification
+ * implementations
+ *
* The presence of the following fields depends on the restrictions
* and joins that the relation participates in:
*
@@ -930,7 +943,6 @@ typedef struct RelOptInfo
List *partial_pathlist; /* partial Paths */
struct Path *cheapest_startup_path;
struct Path *cheapest_total_path;
- struct Path *cheapest_unique_path;
List *cheapest_parameterized_paths;
/*
@@ -1004,6 +1016,16 @@ typedef struct RelOptInfo
/* known not unique for these set(s) */
List *non_unique_for_rels;
+ /*
+ * information about unique-ification of this relation
+ */
+ /* the unique-ified version of the relation */
+ struct RelOptInfo *unique_rel;
+ /* pathkeys for sort-based unique-ification implementations */
+ List *unique_pathkeys;
+ /* SortGroupClause nodes for hash-based unique-ification implementations */
+ List *unique_groupclause;
+
/*
* used by various scans and joins:
*/
@@ -1097,6 +1119,17 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is given relation unique-ified?
+ *
+ * When the nominal jointype is JOIN_INNER, sjinfo->jointype is JOIN_SEMI, and
+ * the given rel is exactly the RHS of the semijoin, it indicates that the rel
+ * has been unique-ified.
+ */
+#define RELATION_WAS_MADE_UNIQUE(rel, sjinfo, nominal_jointype) \
+ ((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
+ bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -1741,8 +1774,8 @@ typedef struct ParamPathInfo
* and the specified outer rel(s).
*
* "rows" is the same as parent->rows in simple paths, but in parameterized
- * paths and UniquePaths it can be less than parent->rows, reflecting the
- * fact that we've filtered by extra join conditions or removed duplicates.
+ * paths it can be less than parent->rows, reflecting the fact that we've
+ * filtered by extra join conditions.
*
* "pathkeys" is a List of PathKey nodes (see above), describing the sort
* ordering of the path's output rows.
@@ -2141,34 +2174,6 @@ typedef struct MemoizePath
double est_hit_ratio; /* estimated cache hit ratio, for EXPLAIN */
} MemoizePath;
-/*
- * UniquePath represents elimination of distinct rows from the output of
- * its subpath.
- *
- * This can represent significantly different plans: either hash-based or
- * sort-based implementation, or a no-op if the input path can be proven
- * distinct already. The decision is sufficiently localized that it's not
- * worth having separate Path node types. (Note: in the no-op case, we could
- * eliminate the UniquePath node entirely and just return the subpath; but
- * it's convenient to have a UniquePath in the path tree to signal upper-level
- * routines that the input is known distinct.)
- */
-typedef enum UniquePathMethod
-{
- UNIQUE_PATH_NOOP, /* input is known unique already */
- UNIQUE_PATH_HASH, /* use hashing */
- UNIQUE_PATH_SORT, /* use sorting */
-} UniquePathMethod;
-
-typedef struct UniquePath
-{
- Path path;
- Path *subpath;
- UniquePathMethod umethod;
- List *in_operators; /* equality operators of the IN clause */
- List *uniq_exprs; /* expressions to be made unique */
-} UniquePath;
-
/*
* GatherPath runs several copies of a plan in parallel and collects the
* results. The parallel leader may also execute the plan, unless the
@@ -2375,17 +2380,17 @@ typedef struct GroupPath
} GroupPath;
/*
- * UpperUniquePath represents adjacent-duplicate removal (in presorted input)
+ * UniquePath represents adjacent-duplicate removal (in presorted input)
*
* The columns to be compared are the first numkeys columns of the path's
* pathkeys. The input is presumed already sorted that way.
*/
-typedef struct UpperUniquePath
+typedef struct UniquePath
{
Path path;
Path *subpath; /* path representing input source */
int numkeys; /* number of pathkey columns to compare */
-} UpperUniquePath;
+} UniquePath;
/*
* AggPath represents generic computation of aggregate functions
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 58936e963cb..763cd25bb3c 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -91,8 +91,6 @@ extern MemoizePath *create_memoize_path(PlannerInfo *root,
bool singlerow,
bool binary_mode,
Cardinality est_calls);
-extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
- Path *subpath, SpecialJoinInfo *sjinfo);
extern GatherPath *create_gather_path(PlannerInfo *root,
RelOptInfo *rel, Path *subpath, PathTarget *target,
Relids required_outer, double *rows);
@@ -223,11 +221,11 @@ extern GroupPath *create_group_path(PlannerInfo *root,
List *groupClause,
List *qual,
double numGroups);
-extern UpperUniquePath *create_upper_unique_path(PlannerInfo *root,
- RelOptInfo *rel,
- Path *subpath,
- int numCols,
- double numGroups);
+extern UniquePath *create_unique_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ Path *subpath,
+ int numCols,
+ double numGroups);
extern AggPath *create_agg_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 347c582a789..f220e9a270d 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -59,4 +59,7 @@ extern Path *get_cheapest_fractional_path(RelOptInfo *rel,
extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);
+extern RelOptInfo *create_unique_paths(PlannerInfo *root, RelOptInfo *rel,
+ SpecialJoinInfo *sjinfo);
+
#endif /* PLANNER_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 4d5d35d0727..98b05c94a11 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -9468,23 +9468,20 @@ where exists (select 1 from tenk1 t3
---------------------------------------------------------------------------------
Nested Loop
Output: t1.unique1, t2.hundred
- -> Hash Join
+ -> Merge Join
Output: t1.unique1, t3.tenthous
- Hash Cond: (t3.thousand = t1.unique1)
- -> HashAggregate
+ Merge Cond: (t3.thousand = t1.unique1)
+ -> Unique
Output: t3.thousand, t3.tenthous
- Group Key: t3.thousand, t3.tenthous
-> Index Only Scan using tenk1_thous_tenthous on public.tenk1 t3
Output: t3.thousand, t3.tenthous
- -> Hash
+ -> Index Only Scan using onek_unique1 on public.onek t1
Output: t1.unique1
- -> Index Only Scan using onek_unique1 on public.onek t1
- Output: t1.unique1
- Index Cond: (t1.unique1 < 1)
+ Index Cond: (t1.unique1 < 1)
-> Index Only Scan using tenk1_hundred on public.tenk1 t2
Output: t2.hundred
Index Cond: (t2.hundred = t3.tenthous)
-(18 rows)
+(15 rows)
-- ... unless it actually is unique
create table j3 as select unique1, tenthous from onek;
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index d5368186caa..24e06845f92 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1134,48 +1134,50 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- Join Filter: (t1_2.a = t1_5.b)
- -> HashAggregate
- Group Key: t1_5.b
+ -> Nested Loop
+ Join Filter: (t1_2.a = t1_5.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_5.b
-> Hash Join
Hash Cond: (((t2_1.a + t2_1.b) / 2) = t1_5.b)
-> Seq Scan on prt1_e_p1 t2_1
-> Hash
-> Seq Scan on prt2_p1 t1_5
Filter: (a = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
- Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_3.a = t1_6.b)
- -> HashAggregate
- Group Key: t1_6.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_2
+ Index Cond: (a = ((t2_1.a + t2_1.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_3.a = t1_6.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Join
Hash Cond: (((t2_2.a + t2_2.b) / 2) = t1_6.b)
-> Seq Scan on prt1_e_p2 t2_2
-> Hash
-> Seq Scan on prt2_p2 t1_6
Filter: (a = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
- Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
- Filter: (b = 0)
- -> Nested Loop
- Join Filter: (t1_4.a = t1_7.b)
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_3
+ Index Cond: (a = ((t2_2.a + t2_2.b) / 2))
+ Filter: (b = 0)
+ -> Nested Loop
+ Join Filter: (t1_4.a = t1_7.b)
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Nested Loop
-> Seq Scan on prt2_p3 t1_7
Filter: (a = 0)
-> Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t2_3
Index Cond: (((a + b) / 2) = t1_7.b)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
- Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
- Filter: (b = 0)
-(41 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_4
+ Index Cond: (a = ((t2_3.a + t2_3.b) / 2))
+ Filter: (b = 0)
+(43 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1, prt1_e t2 WHERE t1.a = 0 AND t1.b = (t2.a + t2.b)/2) AND t1.b = 0 ORDER BY t1.a;
a | b | c
@@ -1190,46 +1192,48 @@ EXPLAIN (COSTS OFF)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
---------------------------------------------------------------------------
- Sort
+ Merge Append
Sort Key: t1.a
- -> Append
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_6.b
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_6.b
-> Hash Semi Join
Hash Cond: (t1_6.b = ((t1_9.a + t1_9.b) / 2))
-> Seq Scan on prt2_p1 t1_6
-> Hash
-> Seq Scan on prt1_e_p1 t1_9
Filter: (c = 0)
- -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
- Index Cond: (a = t1_6.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_7.b
+ -> Index Scan using iprt1_p1_a on prt1_p1 t1_3
+ Index Cond: (a = t1_6.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_7.b
-> Hash Semi Join
Hash Cond: (t1_7.b = ((t1_10.a + t1_10.b) / 2))
-> Seq Scan on prt2_p2 t1_7
-> Hash
-> Seq Scan on prt1_e_p2 t1_10
Filter: (c = 0)
- -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
- Index Cond: (a = t1_7.b)
- Filter: (b = 0)
- -> Nested Loop
- -> HashAggregate
- Group Key: t1_8.b
+ -> Index Scan using iprt1_p2_a on prt1_p2 t1_4
+ Index Cond: (a = t1_7.b)
+ Filter: (b = 0)
+ -> Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: t1_8.b
-> Hash Semi Join
Hash Cond: (t1_8.b = ((t1_11.a + t1_11.b) / 2))
-> Seq Scan on prt2_p3 t1_8
-> Hash
-> Seq Scan on prt1_e_p3 t1_11
Filter: (c = 0)
- -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
- Index Cond: (a = t1_8.b)
- Filter: (b = 0)
-(39 rows)
+ -> Index Scan using iprt1_p3_a on prt1_p3 t1_5
+ Index Cond: (a = t1_8.b)
+ Filter: (b = 0)
+(41 rows)
SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (SELECT (t1.a + t1.b)/2 FROM prt1_e t1 WHERE t1.c = 0)) AND t1.b = 0 ORDER BY t1.a;
a | b | c
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 18fed63e738..0563d0cd5a1 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -707,6 +707,212 @@ select * from numeric_table
3
(4 rows)
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Index Cond: (t2.a = t3.b)
+(18 rows)
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Incremental Sort
+ Output: t1.a, t1.b, t2.a, t2.b
+ Sort Key: t1.a, t2.a
+ Presorted Key: t1.a
+ -> Merge Join
+ Output: t1.a, t1.b, t2.a, t2.b
+ Merge Cond: (t1.a = ((t3.a + 1)))
+ -> Index Only Scan using semijoin_unique_tbl_a_b_idx on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Sort
+ Output: t2.a, t2.b, t3.a, ((t3.a + 1))
+ Sort Key: ((t3.a + 1))
+ -> Hash Join
+ Output: t2.a, t2.b, t3.a, (t3.a + 1)
+ Hash Cond: (t2.a = (t3.b + 1))
+ -> Seq Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ -> Hash
+ Output: t3.a, t3.b
+ -> HashAggregate
+ Output: t3.a, t3.b
+ Group Key: (t3.a + 1), (t3.b + 1)
+ -> Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b, (t3.a + 1), (t3.b + 1)
+(24 rows)
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+set enable_indexscan to off;
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Nested Loop
+ Output: t1.a, t1.b, t2.a, t2.b
+ -> Merge Join
+ Output: t1.a, t1.b, t3.b
+ Merge Cond: (t3.a = t1.a)
+ -> Unique
+ Output: t3.a, t3.b
+ -> Gather Merge
+ Output: t3.a, t3.b
+ Workers Planned: 2
+ -> Sort
+ Output: t3.a, t3.b
+ Sort Key: t3.a, t3.b
+ -> HashAggregate
+ Output: t3.a, t3.b
+ Group Key: t3.a, t3.b
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t3
+ Output: t3.a, t3.b
+ -> Materialize
+ Output: t1.a, t1.b
+ -> Gather Merge
+ Output: t1.a, t1.b
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, t1.b
+ Sort Key: t1.a
+ -> Parallel Seq Scan on public.semijoin_unique_tbl t1
+ Output: t1.a, t1.b
+ -> Memoize
+ Output: t2.a, t2.b
+ Cache Key: t3.b
+ Cache Mode: logical
+ -> Bitmap Heap Scan on public.semijoin_unique_tbl t2
+ Output: t2.a, t2.b
+ Recheck Cond: (t2.a = t3.b)
+ -> Bitmap Index Scan on semijoin_unique_tbl_a_b_idx
+ Index Cond: (t2.a = t3.b)
+(37 rows)
+
+reset enable_indexscan;
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+drop table semijoin_unique_tbl;
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+set enable_partitionwise_join to on;
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: t1.a
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t2_1.a, t2_1.b
+ -> Nested Loop
+ Output: t1_1.a, t1_1.b, t3_1.a
+ -> Unique
+ Output: t3_1.a
+ -> Index Only Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t3_1
+ Output: t3_1.a
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t1_1
+ Output: t1_1.a, t1_1.b
+ Index Cond: (t1_1.a = t3_1.a)
+ -> Memoize
+ Output: t2_1.a, t2_1.b
+ Cache Key: t1_1.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p1_a_idx on public.unique_tbl_p1 t2_1
+ Output: t2_1.a, t2_1.b
+ Index Cond: (t2_1.a = t1_1.a)
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t2_2.a, t2_2.b
+ -> Nested Loop
+ Output: t1_2.a, t1_2.b, t3_2.a
+ -> Unique
+ Output: t3_2.a
+ -> Index Only Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t3_2
+ Output: t3_2.a
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t1_2
+ Output: t1_2.a, t1_2.b
+ Index Cond: (t1_2.a = t3_2.a)
+ -> Memoize
+ Output: t2_2.a, t2_2.b
+ Cache Key: t1_2.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p2_a_idx on public.unique_tbl_p2 t2_2
+ Output: t2_2.a, t2_2.b
+ Index Cond: (t2_2.a = t1_2.a)
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t2_3.a, t2_3.b
+ -> Nested Loop
+ Output: t1_3.a, t1_3.b, t3_3.a
+ -> Unique
+ Output: t3_3.a
+ -> Sort
+ Output: t3_3.a
+ Sort Key: t3_3.a
+ -> Seq Scan on public.unique_tbl_p3 t3_3
+ Output: t3_3.a
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t1_3
+ Output: t1_3.a, t1_3.b
+ Index Cond: (t1_3.a = t3_3.a)
+ -> Memoize
+ Output: t2_3.a, t2_3.b
+ Cache Key: t1_3.a
+ Cache Mode: logical
+ -> Index Scan using unique_tbl_p3_a_idx on public.unique_tbl_p3 t2_3
+ Output: t2_3.a, t2_3.b
+ Index Cond: (t2_3.a = t1_3.a)
+(59 rows)
+
+reset enable_partitionwise_join;
+drop table unique_tbl_p;
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
@@ -2672,18 +2878,17 @@ EXPLAIN (COSTS OFF)
SELECT * FROM onek
WHERE (unique1,ten) IN (VALUES (1,1), (20,0), (99,9), (17,99))
ORDER BY unique1;
- QUERY PLAN
------------------------------------------------------------------
- Sort
- Sort Key: onek.unique1
- -> Nested Loop
- -> HashAggregate
- Group Key: "*VALUES*".column1, "*VALUES*".column2
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop
+ -> Unique
+ -> Sort
+ Sort Key: "*VALUES*".column1, "*VALUES*".column2
-> Values Scan on "*VALUES*"
- -> Index Scan using onek_unique1 on onek
- Index Cond: (unique1 = "*VALUES*".column1)
- Filter: ("*VALUES*".column2 = ten)
-(9 rows)
+ -> Index Scan using onek_unique1 on onek
+ Index Cond: (unique1 = "*VALUES*".column1)
+ Filter: ("*VALUES*".column2 = ten)
+(8 rows)
EXPLAIN (COSTS OFF)
SELECT * FROM onek
@@ -2858,12 +3063,10 @@ SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) ORDER BY 1);
-> Unique
-> Sort
Sort Key: "*VALUES*".column1
- -> Sort
- Sort Key: "*VALUES*".column1
- -> Values Scan on "*VALUES*"
+ -> Values Scan on "*VALUES*"
-> Index Scan using onek_unique1 on onek
Index Cond: (unique1 = "*VALUES*".column1)
-(9 rows)
+(7 rows)
EXPLAIN (COSTS OFF)
SELECT ten FROM onek WHERE unique1 IN (VALUES (1), (2) LIMIT 1);
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index d9a841fbc9f..a6d276a115b 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -361,6 +361,73 @@ select * from float_table
select * from numeric_table
where num_col in (select float_col from float_table);
+--
+-- Test that a semijoin implemented by unique-ifying the RHS can explore
+-- different paths of the RHS rel.
+--
+
+create table semijoin_unique_tbl (a int, b int);
+insert into semijoin_unique_tbl select i%10, i%10 from generate_series(1,1000)i;
+create index on semijoin_unique_tbl(a, b);
+analyze semijoin_unique_tbl;
+
+-- Ensure that we get a plan with Unique + IndexScan
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- Ensure that we can unique-ify expressions more complex than plain Vars
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a+1, b+1 from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+-- encourage use of parallel plans
+set parallel_setup_cost=0;
+set parallel_tuple_cost=0;
+set min_parallel_table_scan_size=0;
+set max_parallel_workers_per_gather=4;
+
+set enable_indexscan to off;
+
+-- Ensure that we get a parallel plan for the unique-ification
+explain (verbose, costs off)
+select * from semijoin_unique_tbl t1, semijoin_unique_tbl t2
+where (t1.a, t2.a) in (select a, b from semijoin_unique_tbl t3)
+order by t1.a, t2.a;
+
+reset enable_indexscan;
+
+reset max_parallel_workers_per_gather;
+reset min_parallel_table_scan_size;
+reset parallel_tuple_cost;
+reset parallel_setup_cost;
+
+drop table semijoin_unique_tbl;
+
+create table unique_tbl_p (a int, b int) partition by range(a);
+create table unique_tbl_p1 partition of unique_tbl_p for values from (0) to (5);
+create table unique_tbl_p2 partition of unique_tbl_p for values from (5) to (10);
+create table unique_tbl_p3 partition of unique_tbl_p for values from (10) to (20);
+insert into unique_tbl_p select i%12, i from generate_series(0, 1000)i;
+create index on unique_tbl_p1(a);
+create index on unique_tbl_p2(a);
+create index on unique_tbl_p3(a);
+analyze unique_tbl_p;
+
+set enable_partitionwise_join to on;
+
+-- Ensure that the unique-ification works for partition-wise join
+explain (verbose, costs off)
+select * from unique_tbl_p t1, unique_tbl_p t2
+where (t1.a, t2.a) in (select a, a from unique_tbl_p t3)
+order by t1.a, t2.a;
+
+reset enable_partitionwise_join;
+
+drop table unique_tbl_p;
+
--
-- Test case for bug #4290: bogus calculation of subplan param sets
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..e4a9ec65ab4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3159,7 +3159,6 @@ UnicodeNormalizationForm
UnicodeNormalizationQC
Unique
UniquePath
-UniquePathMethod
UniqueRelInfo
UniqueState
UnlistenStmt
@@ -3175,7 +3174,6 @@ UpgradeTaskSlotState
UpgradeTaskStep
UploadManifestCmd
UpperRelationKind
-UpperUniquePath
UserAuth
UserContext
UserMapping
--
2.43.0
v7-0002-Simplify-relation_has_unique_index_for.patchapplication/octet-stream; name=v7-0002-Simplify-relation_has_unique_index_for.patchDownload
From 5d75301d1cc042b2043e3a4a4fe3070087134bd8 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 1 Aug 2025 18:12:30 +0900
Subject: [PATCH v7 2/2] Simplify relation_has_unique_index_for()
Now that the only call to relation_has_unique_index_for() that
supplied an exprlist and oprlist has been removed, the loop handling
those lists is effectively dead code. This patch removes that loop
and simplifies the function accordingly.
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-EBnaRvEs7frTLbsXiweSTUXifsteF-d3rvv01FKO86w@mail.gmail.com
---
src/backend/optimizer/path/indxpath.c | 85 ++++-------------------
src/backend/optimizer/plan/analyzejoins.c | 5 +-
src/include/optimizer/paths.h | 5 +-
3 files changed, 17 insertions(+), 78 deletions(-)
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 601354ea3e0..4f5c98f0091 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4142,47 +4142,26 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* a set of equality conditions, because the conditions constrain all
* columns of some unique index.
*
- * The conditions can be represented in either or both of two ways:
- * 1. A list of RestrictInfo nodes, where the caller has already determined
- * that each condition is a mergejoinable equality with an expression in
- * this relation on one side, and an expression not involving this relation
- * on the other. The transient outer_is_left flag is used to identify which
- * side we should look at: left side if outer_is_left is false, right side
- * if it is true.
- * 2. A list of expressions in this relation, and a corresponding list of
- * equality operators. The caller must have already checked that the operators
- * represent equality. (Note: the operators could be cross-type; the
- * expressions should correspond to their RHS inputs.)
+ * The conditions are provided as a list of RestrictInfo nodes, where the
+ * caller has already determined that each condition is a mergejoinable
+ * equality with an expression in this relation on one side, and an
+ * expression not involving this relation on the other. The transient
+ * outer_is_left flag is used to identify which side we should look at:
+ * left side if outer_is_left is false, right side if it is true.
*
* The caller need only supply equality conditions arising from joins;
* this routine automatically adds in any usable baserestrictinfo clauses.
* (Note that the passed-in restrictlist will be destructively modified!)
+ *
+ * If extra_clauses isn't NULL, return baserestrictinfo clauses which were used
+ * to derive uniqueness.
*/
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
- List *restrictlist,
- List *exprlist, List *oprlist)
-{
- return relation_has_unique_index_ext(root, rel, restrictlist,
- exprlist, oprlist, NULL);
-}
-
-/*
- * relation_has_unique_index_ext
- * Same as relation_has_unique_index_for(), but supports extra_clauses
- * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
- * which were used to derive uniqueness.
- */
-bool
-relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
- List *restrictlist,
- List *exprlist, List *oprlist,
- List **extra_clauses)
+ List *restrictlist, List **extra_clauses)
{
ListCell *ic;
- Assert(list_length(exprlist) == list_length(oprlist));
-
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -4225,7 +4204,7 @@ relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
}
/* Short-circuit the easy case */
- if (restrictlist == NIL && exprlist == NIL)
+ if (restrictlist == NIL)
return false;
/* Examine each index of the relation ... */
@@ -4247,14 +4226,12 @@ relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
continue;
/*
- * Try to find each index column in the lists of conditions. This is
+ * Try to find each index column in the list of conditions. This is
* O(N^2) or worse, but we expect all the lists to be short.
*/
for (c = 0; c < ind->nkeycolumns; c++)
{
- bool matched = false;
ListCell *lc;
- ListCell *lc2;
foreach(lc, restrictlist)
{
@@ -4284,8 +4261,6 @@ relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
- matched = true; /* column is unique */
-
if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
{
MemoryContext oldMemCtx =
@@ -4303,43 +4278,11 @@ relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
MemoryContextSwitchTo(oldMemCtx);
}
- break;
+ break; /* found a match; column is unique */
}
}
- if (matched)
- continue;
-
- forboth(lc, exprlist, lc2, oprlist)
- {
- Node *expr = (Node *) lfirst(lc);
- Oid opr = lfirst_oid(lc2);
-
- /* See if the expression matches the index key */
- if (!match_index_to_operand(expr, c, ind))
- continue;
-
- /*
- * The equality operator must be a member of the index
- * opfamily, else it is not asserting the right kind of
- * equality behavior for this index. We assume the caller
- * determined it is an equality operator, so we don't need to
- * check any more tightly than this.
- */
- if (!op_in_opfamily(opr, ind->opfamily[c]))
- continue;
-
- /*
- * XXX at some point we may need to check collations here too.
- * For the moment we assume all collations reduce to the same
- * notion of equality.
- */
-
- matched = true; /* column is unique */
- break;
- }
-
- if (!matched)
+ if (lc == NULL)
break; /* no match; this index doesn't help us */
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 4d55c2ea591..da92d8ee414 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -990,11 +990,10 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_ext automatically adds any usable
+ * relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
- extra_clauses))
+ if (relation_has_unique_index_for(root, rel, clause_list, extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8410531f2d6..cbade77b717 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,10 +71,7 @@ extern void generate_partitionwise_join_paths(PlannerInfo *root,
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
-extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
- List *restrictlist, List *exprlist,
- List *oprlist, List **extra_clauses);
+ List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
--
2.43.0
On Mon, Aug 18, 2025 at 3:07 PM Richard Guo <guofenglinux@gmail.com> wrote:
Here's the updated version of the patch, which renames the macro
IS_UNIQUEIFIED_REL to RELATION_WAS_MADE_UNIQUE, and includes some
comment updates as well. I plan to push it soon, barring any
objections.
Pushed.
This patch removes the last call to make_sort_from_sortclauses(), so
I'm wondering if we can safely remove the function itself. Or should
we keep it around in case it's used by extensions or might be needed
in the future?
This function, along with two other make_xxx() functions from
createplan.c, was exported in 570be1f73 because CitusDB was using
them.
commit 570be1f73f385abb557bda15b718d7aac616cc15
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat Mar 12 12:12:59 2016 -0500
Re-export a few of createplan.c's make_xxx() functions.
CitusDB is using these and don't wish to redesign their code right now.
I am not on board with this being a good idea, or a good precedent,
but I lack the energy to fight about it.
I actually agree with Tom that it's not a good idea to create Plan
nodes outside of createplan.c; instead, one should construct a Path
tree and let create_plan() convert it into Plan nodes.
I'm not sure whether CitusDB has redesigned their code in this way,
but for now, I prefer not to remove make_sort_from_sortclauses() just
to be safe.
Thanks
Richard
On Wed, Jul 23, 2025 at 5:11 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
As a very trivial test on this patch, I ran the query in your opening
email, both with and without the patch, scaling up the size of the table
a little bit.
This is a really nice improvement. I think we could find queries that
are arbitrarily faster, by feeding enough tuples to the unnecessary Sort
nodes.
FWIW, I'm looking for a query to better showcase the performance
improvement from this patch. Here is one I found.
create table t (a int, b int);
insert into t select i%10, i%10 from generate_series(1,50000) i;
create index on t (a, b);
analyze t;
explain (analyze, costs on)
select * from t t1, t t2 where (t1.a, t2.b) in (select a, b from t t3)
order by t1.a, t2.b;
Here are the planning and execution time on my snail-paced machine
(best of 3), without and with this patch.
-- without this patch
Planning Time: 0.850 ms
Execution Time: 108149.907 ms
-- with this patch
Planning Time: 0.728 ms
Execution Time: 29229.748 ms
So this specific case runs about 3.7 times faster, which is really
nice.
- Richard
On 2/9/2025 12:10, Richard Guo wrote:
So this specific case runs about 3.7 times faster, which is really
nice.No questions, it is good enough optimisation. I'm worried only about
implementation: It creates one more RelOptInfo that may look like a
baserel, but we can't find it by find_base_rel or even find_join_rel. It
seems a little inconsistent to me.
Don't think it is critical - just complicates life for extension
developers in some cases.
--
regards, Andrei Lepikhov
On Tue, Sep 2, 2025 at 7:56 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
No questions, it is good enough optimisation. I'm worried only about
implementation: It creates one more RelOptInfo that may look like a
baserel, but we can't find it by find_base_rel or even find_join_rel. It
seems a little inconsistent to me.
Don't think it is critical - just complicates life for extension
developers in some cases.
The RelOptInfo representing the unique-ified rel is intended to be
used only internally during path generation for semi-joins, and should
be opaque outside of that. I don't think extensions should know about
it.
- Richard
On 3/9/2025 11:12, Richard Guo wrote:
On Tue, Sep 2, 2025 at 7:56 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
No questions, it is good enough optimisation. I'm worried only about
implementation: It creates one more RelOptInfo that may look like a
baserel, but we can't find it by find_base_rel or even find_join_rel. It
seems a little inconsistent to me.
Don't think it is critical - just complicates life for extension
developers in some cases.The RelOptInfo representing the unique-ified rel is intended to be
used only internally during path generation for semi-joins, and should
be opaque outside of that. I don't think extensions should know about
it.
I just stated the fact - it is not for debate ;).
To understand how deeply developers utilise the core, take a look at the
pg_hint_plan. The extensibility of the Postgres planner isn't flexible
enough (and will never be) for the developers' purposes. So, they
exploit every exported function, variable and type.
Every feature intended to be hidden from extensions should be wrapped
into an internal type, not exposed in .h files.
It leads us to a discussion about the voice of extension developers in
the core decisions. I think it is worth debating at one of the
conferences in the near future.
--
regards, Andrei Lepikhov